Characterising the CRISPR immune system in Archaea
Characterising the CRISPR immune system in Archaea
Characterising the CRISPR immune system in Archaea
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ABSTRACT<br />
<strong>Archaea</strong>, a group of microorganisms dist<strong>in</strong>ct from bacteria and<br />
eukaryotes, are equipped with an adaptive <strong>immune</strong> <strong>system</strong> called<br />
<strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>, which relies on an RNA <strong>in</strong>terference mechanism<br />
to combat <strong>in</strong>vad<strong>in</strong>g viruses and plasmids. Us<strong>in</strong>g a genome<br />
sequence analysis approach, <strong>the</strong> four components of archaeal<br />
genomic <strong>CRISPR</strong> loci were analysed, namely, repeats, spacers,<br />
leaders and cas genes. Based on analysis of spacer sequences it<br />
was predicted that <strong>the</strong> <strong>immune</strong> <strong>system</strong> combats viruses and plasmids<br />
by target<strong>in</strong>g <strong>the</strong>ir DNA. Fur<strong>the</strong>rmore, analysis of repeats,<br />
leaders and cas genes revealed that <strong>CRISPR</strong> <strong>system</strong>s exist as dist<strong>in</strong>ct<br />
families which have key differences between <strong>the</strong>mselves.<br />
Closely related organisms were seen harbour<strong>in</strong>g different CR-<br />
ISPR <strong>system</strong>s, while some distantly related species carried similar<br />
<strong>system</strong>s, <strong>in</strong>dicat<strong>in</strong>g frequent horizontal exchange. Moreover, it<br />
was found that cas genes of Type I <strong>CRISPR</strong> <strong>system</strong>s could be divided<br />
<strong>in</strong>to functionally <strong>in</strong>dependent modules which occasionally<br />
exchange to form new comb<strong>in</strong>ations of Type I <strong>system</strong>s. Fur<strong>the</strong>rmore,<br />
Type III <strong>system</strong>s were found to be genomically associated<br />
with various comb<strong>in</strong>ations of accessory genes which may play a<br />
role <strong>in</strong> functionally extend<strong>in</strong>g <strong>the</strong> activity of <strong>the</strong> Type III <strong>in</strong>terference<br />
complexes. This dynamic nature of <strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong><br />
<strong>system</strong>s may be a prerequisite for <strong>the</strong>ir cont<strong>in</strong>ued efficacy aga<strong>in</strong>st<br />
<strong>the</strong> ever chang<strong>in</strong>g threats <strong>the</strong>y protect <strong>the</strong>ir hosts from.<br />
iii
SUMMARY<br />
<strong>Archaea</strong> comprise a group of microorganisms dist<strong>in</strong>ct from both<br />
bacteria and eukaryotes. These organisms are equipped with an<br />
adaptive <strong>immune</strong> <strong>system</strong> aga<strong>in</strong>st <strong>in</strong>vad<strong>in</strong>g viruses and plasmids.<br />
The <strong>immune</strong> <strong>system</strong> works by tak<strong>in</strong>g up DNA from a virus, and<br />
sav<strong>in</strong>g it on <strong>the</strong> host’s own chromosome as a template to produce<br />
<strong>in</strong>terference RNA. The RNA recognises <strong>the</strong> virus <strong>the</strong> next time<br />
it <strong>in</strong>fects and signals <strong>the</strong> degradation of its genetic material. All<br />
<strong>the</strong> components of this <strong>system</strong> are encoded on <strong>the</strong> chromosomes<br />
of <strong>the</strong> organisms, and by look<strong>in</strong>g <strong>in</strong>to <strong>the</strong>se components us<strong>in</strong>g<br />
genome sequence analysis, a number of <strong>in</strong>sights were ga<strong>in</strong>ed.<br />
It was found that <strong>the</strong> <strong>immune</strong> <strong>system</strong> kills viruses by target<strong>in</strong>g<br />
<strong>the</strong>ir DNA, first and foremost. Fur<strong>the</strong>rmore, different archaea<br />
have different variants of <strong>the</strong> <strong>immune</strong> <strong>system</strong>, and most archaea<br />
harbour several variants at <strong>the</strong> same time, probably to aid <strong>the</strong>m<br />
<strong>in</strong> target<strong>in</strong>g different types of viruses. The <strong>system</strong>s <strong>the</strong>mselves<br />
are composed of <strong>in</strong>dependent modules which are responsible for<br />
different stages of <strong>the</strong> <strong>immune</strong> response. By comb<strong>in</strong><strong>in</strong>g <strong>the</strong> modules<br />
<strong>in</strong> various comb<strong>in</strong>ations and extend<strong>in</strong>g <strong>the</strong>m with additional<br />
components as well as exchang<strong>in</strong>g <strong>the</strong>m with o<strong>the</strong>r archaea, <strong>the</strong><br />
organisms ensure that <strong>the</strong>ir <strong>immune</strong> <strong>system</strong>s are fit to handle<br />
diverse and cont<strong>in</strong>uously evolv<strong>in</strong>g threats.<br />
iv
SAMMENFATNING<br />
Arkæa udgør en gruppe af organismer som er forskellige fra<br />
både bakterier og eukaryoter. De er udstyrret med et adaptivt<br />
immun <strong>system</strong> mod <strong>in</strong>vaderende vira og plasmider. Immun<strong>system</strong>et<br />
virker ved at optage virussens DNA, som bliver gemt i<br />
værtens eget kromosom for derved at blive brugt som skabelon<br />
til fremstill<strong>in</strong>g af <strong>in</strong>terferens RNA. RNA’et genkender virussen<br />
næste gang den <strong>in</strong>ficerer, og signalerer derved for nedbrydelsen<br />
af virussens genetiske materiale. Alle immun<strong>system</strong>ets komponenter<br />
er <strong>in</strong>dkodet i organismernes DNA, og ved at undersøge<br />
komponenterne gennem genom sekvens analyse blev der gjort<br />
en række opdagelser. Vi fandt ud af at immun<strong>system</strong>et dræber<br />
vira ved først og fremmest at angribe deres DNA. Derudover<br />
har forskellige arkæa forskellige varianter af immun <strong>system</strong>et, og<br />
de fleste arkæa besidder flere af varianterne på én gang, hvilket<br />
sandsynligvis hjælper dem med at kunne tackle forskellige typer<br />
vira. Selve immun<strong>system</strong>erne består af uafhængige moduler som<br />
hver især står for forskellige stadier af immun reaktionen. Ved at<br />
komb<strong>in</strong>ere modulerne i forskellige komb<strong>in</strong>ationer eller udvide<br />
dem med yderligere komponenter samt at ombytte dem med<br />
andre arkæa, sikrer organismerne sig at deres immun <strong>system</strong><br />
er opdateret til at kunne modstå de forskelligartede trusler som<br />
hele tiden udvikler sig.<br />
v
PREFACE<br />
The work presented <strong>in</strong> this <strong>the</strong>sis was carried out at <strong>the</strong> Danish<br />
<strong>Archaea</strong> Centre at <strong>the</strong> Department of Biology, University of<br />
Copenhagen from May 2007 to July 2012 under <strong>the</strong> supervision<br />
of Professor Roger A. Garrett.<br />
The <strong>in</strong>itial objective of <strong>the</strong> Ph.D. study was to characterise<br />
<strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong> <strong>in</strong> Sulfolobus species us<strong>in</strong>g computational<br />
methods, especially comparative genome sequence analysis.<br />
Sulfolobus species have been extensively studied with regard to<br />
<strong>the</strong> viruses and plasmids which <strong>in</strong>fect <strong>the</strong>m, with many genome<br />
sequences available of hosts as well as <strong>the</strong>ir extrachromosomal<br />
elements. Fur<strong>the</strong>rmore, Sulfolobus species harbour extensive and<br />
diverse <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s. Thus <strong>the</strong>re was more than<br />
enough data to beg<strong>in</strong> <strong>the</strong> analyses. After exhaust<strong>in</strong>g <strong>the</strong> possible<br />
ways of analys<strong>in</strong>g <strong>CRISPR</strong> spacer, repeat, leader and cas gene<br />
sequences from Sulfolobales, <strong>the</strong> analyses were extended to <strong>the</strong><br />
rest of <strong>the</strong> available archaeal genomes <strong>in</strong> collaboration with Dr.<br />
Gisle A. Vestergaard start<strong>in</strong>g July 2010. Gisle worked with me on<br />
<strong>the</strong> project until October 2011 after which I overtook it. Extend<strong>in</strong>g<br />
<strong>the</strong> study to o<strong>the</strong>r archaea proved fruitful, but was also a big<br />
mouthful, and <strong>the</strong>se analyses are still <strong>in</strong> <strong>the</strong> process of be<strong>in</strong>g<br />
completed. Prelim<strong>in</strong>ary results have, however, been <strong>in</strong>cluded <strong>in</strong><br />
this <strong>the</strong>sis, to which especially <strong>the</strong> last part is dedicated.<br />
The results from this Ph.D. study have been published throughout<br />
many <strong>in</strong>dividual research papers, which are all enclosed,<br />
and most of which have multiple co-authors. Therefore <strong>the</strong> extent<br />
of my own contributions to each of <strong>the</strong>se papers have been<br />
stipulated on <strong>the</strong> sheet preced<strong>in</strong>g every paper.<br />
For <strong>the</strong> sake of clarify<strong>in</strong>g <strong>the</strong> extent of my collaboration with<br />
Dr. Gisle A. Vestergaard and its <strong>in</strong>fluence on what is presented<br />
<strong>in</strong> this <strong>the</strong>sis, he was deeply <strong>in</strong>volved <strong>in</strong> all work concern<strong>in</strong>g<br />
<strong>the</strong> classification of archaeal cas genes <strong>in</strong>to separate functional<br />
modules (aCas, iCas, etc.), and <strong>the</strong> classification of iCmr modules<br />
<strong>in</strong>to 5 families, A through E. In <strong>the</strong>se studies our workload<br />
was more or less equal. Some aspects of this work is already<br />
published while o<strong>the</strong>rs rema<strong>in</strong>. As for <strong>the</strong> analysis of archaeal<br />
<strong>CRISPR</strong> repeats and leaders, as well as <strong>the</strong> def<strong>in</strong>ition of iCmr<br />
accessory genes, <strong>the</strong>se analyses were conducted by my myself<br />
after Gisle left <strong>the</strong> project and are still unpublished.<br />
vi
ACKNOWLEDGEMENTS<br />
First and foremost I’d like to thank my supervisor, Professor<br />
Roger A. Garrett. We had many long, <strong>in</strong>spir<strong>in</strong>g discussions. Also,<br />
with Roger I saw how experience and wisdom go hand <strong>in</strong> hand.<br />
But most importantly I want to thank him for hav<strong>in</strong>g patience<br />
and ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g his trust <strong>in</strong> me dur<strong>in</strong>g <strong>the</strong> challenges which<br />
were faced.<br />
Dur<strong>in</strong>g my Ph.D. I made friends. Chandra, Gisle, Chao and<br />
L<strong>in</strong>g. Four friends <strong>in</strong> four years. It’s difficult not to be thankful<br />
for that.<br />
Also, I had <strong>the</strong> privilege of work<strong>in</strong>g <strong>in</strong> a lab full of absolutely<br />
terrific people. I can’t th<strong>in</strong>k of a s<strong>in</strong>gle exception. Despite many<br />
of us hav<strong>in</strong>g to deal with our own day-to-day challenges, between<br />
us <strong>the</strong>re was a genu<strong>in</strong>ely positive and understand<strong>in</strong>g atmosphere.<br />
When compar<strong>in</strong>g with most o<strong>the</strong>r workplaces you realise that<br />
this is <strong>the</strong> k<strong>in</strong>d of th<strong>in</strong>g you mustn’t take for granted.<br />
vii
LIST OF TABLES<br />
Table 1 Features of <strong>Archaea</strong>, Bacteria and Eucarya 6<br />
Table 2 Extremophile record-holders 9<br />
Table 3 Sequenced Sulfolobales genomes 15<br />
Table 4 Properties of Sulfolobus viruses & plasmids 18<br />
Table 5 cas genes and <strong>the</strong>ir functions 25<br />
Table 6 Overview of accessory iCmr genes 33<br />
xi
LIST OF FIGURES<br />
Figure 1 Haeckel’s tree of life 2<br />
Figure 2 16S-RNA universal phylogenetic tree 4<br />
Figure 3 RNA polymerases form <strong>the</strong> three doma<strong>in</strong>s 7<br />
Figure 4 <strong>Archaea</strong>l 16S RNA phylogenetic tree 11<br />
Figure 5 A Sulfolobus cell <strong>in</strong>fected with a virus 13<br />
Figure 6 Morphologies of select archaeal viruses 17<br />
Figure 7 <strong>CRISPR</strong> immunity: mode of action 20<br />
Figure 8 Gene maps of accessory iCmr genes 34<br />
Figure 9 Tree of csx1 genes from Sulfolobales 37<br />
xiii
3 DISCUSSION &<br />
PERSPECTIVES<br />
Although experimental studies resolv<strong>in</strong>g <strong>the</strong> mechanistic details<br />
of <strong>the</strong> Cas <strong>in</strong>terference complexes have started to ga<strong>in</strong> momentum,<br />
with more and more articles be<strong>in</strong>g published each month, <strong>the</strong>re<br />
are still many holes <strong>in</strong> our understand<strong>in</strong>g of key parts of <strong>the</strong> <strong>in</strong>terference<br />
process. Despite this, <strong>the</strong>re are some marked differences<br />
which are already established between iCas and iCmr, such as <strong>the</strong><br />
manner <strong>in</strong> which self vs. non-self nucleic acid is dist<strong>in</strong>guished1 ,<br />
or <strong>the</strong> species of mature crRNA (long[10] vs. short[27]) utilised<br />
by ei<strong>the</strong>r <strong>system</strong>.<br />
In addition to such specific differences, a deeper divergence<br />
between <strong>the</strong> two <strong>system</strong>s is becom<strong>in</strong>g <strong>in</strong>creas<strong>in</strong>gly apparent. For<br />
<strong>the</strong> iCas prote<strong>in</strong> complex, <strong>the</strong> f<strong>in</strong>d<strong>in</strong>gs[32, 67] first made for <strong>the</strong><br />
<strong>system</strong> <strong>in</strong> E. coli (Type E), are remarkably be<strong>in</strong>g rediscovered <strong>in</strong><br />
<strong>the</strong> diverse iCas <strong>system</strong>s (types A, D and F) of Sulfolobus, Bacillus<br />
and Pseudomonas[39, 53, 81]. As for iCmr, <strong>the</strong> opposite seems to<br />
be <strong>the</strong> case. Here we see surpris<strong>in</strong>g mechanistic diversity despite<br />
<strong>the</strong> apparent homology between <strong>the</strong> <strong>system</strong>s. E. g. <strong>the</strong> Type<br />
A iCmr <strong>system</strong> of Staphylococcus targets DNA[47], and while<br />
two different Type B <strong>system</strong>s, <strong>in</strong> Pyrococcus[27] and Sulfolobus[86]<br />
respectively, both target RNA, <strong>the</strong>y do so <strong>in</strong> ways which are very<br />
different2 . Fur<strong>the</strong>rmore, <strong>the</strong>re is evidence now for a ano<strong>the</strong>r<br />
Sulfolobus Type B iCmr <strong>system</strong> target<strong>in</strong>g DNA (Deng et al., under<br />
revision), obscur<strong>in</strong>g <strong>the</strong> picture even fur<strong>the</strong>r.<br />
This tendency for iCas <strong>system</strong>s be<strong>in</strong>g mechanistically conserved,<br />
and iCmr <strong>system</strong>s exhibit<strong>in</strong>g diversity, is also reflected<br />
on <strong>the</strong> genomic level. Although sequences of <strong>the</strong> <strong>in</strong>dividual iCas<br />
genes have diverged considerably between <strong>the</strong> subtypes, some<br />
even beyond recognition, <strong>the</strong> overall gene composition is constant,<br />
with cas3, cas5, cas7 and cas8 compris<strong>in</strong>g a universal core.<br />
iCmr modules on <strong>the</strong> o<strong>the</strong>r hand vary with regard to <strong>the</strong> content<br />
of RAMP genes depend<strong>in</strong>g on <strong>the</strong>ir be<strong>in</strong>g types A, B, C or D.<br />
Also, and perhaps more importantly, very similar iCmr modules<br />
are sometimes seen accompanied by different comb<strong>in</strong>ations of<br />
accessory genes (Section 2.4.1) which encode prote<strong>in</strong>s that may<br />
be responsible for modify<strong>in</strong>g <strong>the</strong> core functionality of <strong>the</strong> iCmr<br />
complex, possibly account<strong>in</strong>g for <strong>the</strong> mechanistic diversity so far<br />
35<br />
1with PAMs[49] as<br />
opposed to<br />
base-pair<strong>in</strong>g[48]<br />
respectively<br />
2 Both types utilise<br />
crRNA to target<br />
complementary<br />
ssRNA, but while<br />
<strong>the</strong> former always<br />
cleaves <strong>the</strong> target<br />
RNA at a fixed<br />
position employ<strong>in</strong>g<br />
some k<strong>in</strong>d of<br />
ruler-mechanism, <strong>the</strong><br />
latter cleaves <strong>the</strong><br />
target <strong>in</strong> a sequence<br />
specific manner at<br />
each ‘UA’<br />
d<strong>in</strong>ucleotide<br />
encountered
4 CONCLUSION<br />
Dur<strong>in</strong>g this Ph.D study <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s <strong>in</strong> Sulfolobales have<br />
been extensively characterised us<strong>in</strong>g a bio<strong>in</strong>formatical genome<br />
sequence analysis approach. Later <strong>the</strong> analyses were extended<br />
to all available archaeal genomes. The latter work is still <strong>in</strong> <strong>the</strong><br />
process of be<strong>in</strong>g concluded. To summarise:<br />
• Sulfolobales <strong>CRISPR</strong> spacer sequences were analysed to<br />
f<strong>in</strong>d matches to <strong>the</strong> large number of sequenced Sulfolobales<br />
extrachromosomal elements. The matches obta<strong>in</strong>ed were<br />
used to successfully predict <strong>the</strong> target nucleic acid of <strong>the</strong><br />
<strong>CRISPR</strong> <strong>system</strong>, back when <strong>the</strong> target nucleic acid was not<br />
known.<br />
• Analysis of Sulfolobales <strong>CRISPR</strong> repeats, spacers, leaders<br />
and cas genes revealed that <strong>CRISPR</strong> <strong>system</strong>s exist <strong>in</strong> families,<br />
where leader types, repeat types, cas gene types and<br />
PAM motifs go hand <strong>in</strong> hand. At <strong>the</strong> time, this had not<br />
been shown for any o<strong>the</strong>r organism.<br />
• Analysis of <strong>the</strong> genomic contexts of Sulfolobales CRIS-<br />
PR/Cas loci revealed that <strong>the</strong> <strong>system</strong>s are located <strong>in</strong> genomic<br />
hyper-variable regions and subject to frequent horizontal<br />
gene transfer, where transposable elements and<br />
tox<strong>in</strong>-antitox<strong>in</strong> loci play a role <strong>in</strong> modulat<strong>in</strong>g <strong>the</strong>ir mobility.<br />
• Extend<strong>in</strong>g <strong>the</strong> analyses to o<strong>the</strong>r archaea outside Sulfolobales<br />
revealed that <strong>the</strong> <strong>CRISPR</strong> <strong>in</strong>terference modules (iCas<br />
and iCmr) <strong>in</strong> particular are very diverse, while <strong>the</strong> adaptation<br />
modules (aCas) are remarkably conserved. Individual<br />
modules were also seen <strong>in</strong>terchang<strong>in</strong>g giv<strong>in</strong>g rise to CR-<br />
ISPR/Cas loci with different comb<strong>in</strong>ations of functional<br />
modules.<br />
• iCmr modules of different types were found to be associated<br />
with a rich array of various accessory genes which were<br />
also found to exchange between different types of iCmr<br />
modules. It was hypo<strong>the</strong>sised that <strong>the</strong>se accessory genes<br />
extend <strong>the</strong> core functionality of <strong>the</strong> iCmr modules, e. g. by<br />
conferr<strong>in</strong>g <strong>the</strong> ability to switch target nucleic acids.<br />
39
5 PUBLICATIONS<br />
The publications result<strong>in</strong>g from this PhD study are <strong>in</strong>cluded <strong>in</strong><br />
this chapter <strong>in</strong> chronological order. As most of <strong>the</strong> publications<br />
here have multiple authors with vary<strong>in</strong>g contributions, I have<br />
rated my own level of contribution to each publication as ei<strong>the</strong>r<br />
‘major, ‘substantial’ or ‘m<strong>in</strong>or’. ‘Major’ means that <strong>the</strong> majority<br />
of <strong>the</strong> work beh<strong>in</strong>d <strong>the</strong> publication was carried out by myself.<br />
‘Substantial’ means that my contribution comprised a smaller<br />
but crucial part of <strong>the</strong> manuscript, while ‘m<strong>in</strong>or’ means that<br />
my contribution was small and non-crucial to that particular<br />
manuscript, although still a part of my own Ph. D project. In<br />
addition to my level of contribution, <strong>the</strong> exact nature of my<br />
contribution is also stipulated.<br />
A note on iCmr family nomenclature<br />
In conformance with <strong>the</strong> recent update of cas gene nomenclature[45],<br />
<strong>the</strong> iCmr families referred to <strong>in</strong> publications 5.7[25], 5.8[20] and<br />
5.10[21] as ‘B’ and ‘C’ are now merged <strong>in</strong>to ‘B’, while ‘E’ is now<br />
‘A’, ‘A’ is now ‘C’, and ‘D’ rema<strong>in</strong>s ‘D’. The new nomenclature is<br />
used throughout this <strong>the</strong>sis and <strong>in</strong> publications to come, whereas<br />
<strong>the</strong> publications listed above conta<strong>in</strong> <strong>the</strong> old nomenclature. So <strong>in</strong><br />
summary:<br />
<strong>in</strong> <strong>the</strong>sis <strong>in</strong> [25], [20] and [21]<br />
A E<br />
B B and C<br />
C A<br />
D D<br />
41
JOURNAL OF BACTERIOLOGY, Oct. 2008, p. 6837–6845 Vol. 190, No. 20<br />
0021-9193/08/$08.000 doi:10.1128/JB.00795-08<br />
Copyright © 2008, American Society for Microbiology. All Rights Reserved.<br />
Stygiolobus Rod-Shaped Virus and <strong>the</strong> Interplay of Crenarchaeal<br />
Rudiviruses with <strong>the</strong> <strong>CRISPR</strong> Antiviral System †<br />
Gisle Vestergaard, 1 Shiraz A. Shah, 1 Ariane Bize, 2 Werner Reitberger, 3 Monika Reuter, 3 Hien Phan, 1<br />
Ariane Briegel, 4 Re<strong>in</strong>hard Rachel, 3 Roger A. Garrett, 1 and David Prangishvili 2 *<br />
Danish <strong>Archaea</strong> Centre and Centre for Comparative Genomics, Department of Biology, Copenhagen University,<br />
Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark 1 ;MolecularBiologyof<strong>the</strong>Gene<strong>in</strong>ExtremophilesUnit,<br />
Institut Pasteur, rue Dr. Roux 25, 75724 Paris Cedex 15, France 2 ; Department of Microbiology,<br />
University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany 3 ; and<br />
Max-Planck-Institut of Biochemistry, Molecular Structural Biology, Am Klopferspitz 21,<br />
D-82152 Mart<strong>in</strong>sried, Germany 4<br />
Received 6 June 2008/Accepted 11 August 2008<br />
A newly characterized archaeal rudivirus Stygiolobus rod-shaped virus (SRV), which <strong>in</strong>fects a hyper<strong>the</strong>rmophilic<br />
Stygiolobus species, was isolated from a hot spr<strong>in</strong>g <strong>in</strong> <strong>the</strong> Azores, Portugal. Its virions are rod-shaped, 702<br />
( 50) by 22 ( 3) nm <strong>in</strong> size, and nonenveloped and carry three tail fibers at each term<strong>in</strong>us. The l<strong>in</strong>ear<br />
double-stranded DNA genome conta<strong>in</strong>s 28,096 bp and an <strong>in</strong>verted term<strong>in</strong>al repeat of 1,030 bp. The SRV shows<br />
morphological and genomic similarities to <strong>the</strong> o<strong>the</strong>r characterized rudiviruses Sulfolobus rod-shaped virus 1<br />
(SIRV1), SIRV2, and Acidianus rod-shaped virus 1, isolated from hot acidic spr<strong>in</strong>gs of Iceland and Italy. The<br />
s<strong>in</strong>gle major rudiviral structural prote<strong>in</strong> is shown to generate long tubular structures <strong>in</strong> vitro of similar<br />
dimensions to those of <strong>the</strong> virion, and we estimate that <strong>the</strong> virion constitutes a s<strong>in</strong>gle, superhelical, doublestranded<br />
DNA embedded <strong>in</strong>to such a prote<strong>in</strong> structure. Three additional m<strong>in</strong>or conserved structural prote<strong>in</strong>s<br />
are also identified. Ubiquitous rudiviral prote<strong>in</strong>s with assigned functions <strong>in</strong>clude glycosyl transferases and a<br />
S-adenosylmethion<strong>in</strong>e-dependent methyltransferase, as well as a Holliday junction resolvase, a transcriptionally<br />
coupled helicase and nuclease implicated <strong>in</strong> DNA replication. Analysis of matches between known crenarchaeal<br />
chromosomal <strong>CRISPR</strong> spacer sequences, implicated <strong>in</strong> a viral defense <strong>system</strong>, and rudiviral genomes<br />
revealed that about 10% of <strong>the</strong> 3,042 unique acido<strong>the</strong>rmophile spacers yield significant matches to rudiviral<br />
genomes, with a bias to highly conserved prote<strong>in</strong> genes, consistent with <strong>the</strong> widespread presence of rudiviruses<br />
<strong>in</strong> hot acidophilic environments. We propose that <strong>the</strong> 12-bp <strong>in</strong>dels which are commonly found <strong>in</strong> conserved<br />
rudiviral prote<strong>in</strong> genes may be generated as a reaction to <strong>the</strong> presence of <strong>the</strong> host <strong>CRISPR</strong> defense <strong>system</strong>.<br />
Viruses of <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaea are extremely<br />
diverse <strong>in</strong> <strong>the</strong>ir morphotypes and <strong>in</strong> <strong>the</strong> properties<br />
of <strong>the</strong>ir double-stranded DNA (dsDNA) genomes (reviewed<br />
<strong>in</strong> references 19 and 23). Moreover, some of <strong>the</strong> virion<br />
morphotypes are unique for dsDNA viruses from any doma<strong>in</strong><br />
of life. Many of <strong>the</strong>se viruses have been classified <strong>in</strong>to<br />
seven new families that <strong>in</strong>clude rod-shaped rudiviruses, filamentous<br />
lipothrixviruses, sp<strong>in</strong>dle-shaped fuselloviruses,<br />
and a bottle-shaped ampullavirus (reviewed <strong>in</strong> reference<br />
24). The bicaudavirus Acidianus two-tailed virus (ATV) exhibits<br />
an exceptional two-tailed morphology and <strong>the</strong> unique<br />
viral property of develop<strong>in</strong>g long tail-like appendages <strong>in</strong>dependently<br />
of <strong>the</strong> host cell (11). Crenarchaeal viral research<br />
is still at an early stage of development, and <strong>in</strong>sights <strong>in</strong>to<br />
basic molecular processes, <strong>in</strong>clud<strong>in</strong>g <strong>in</strong>fection, replication,<br />
packag<strong>in</strong>g, and virus-host <strong>in</strong>teractions, are limited. One of<br />
<strong>the</strong> ma<strong>in</strong> reasons for this lies <strong>in</strong> <strong>the</strong> high proportion of<br />
predicted genes with unknown functions (25).<br />
* Correspond<strong>in</strong>g author. Mail<strong>in</strong>g address: Molecular Biology of <strong>the</strong><br />
Gene <strong>in</strong> Extremophiles Unit, Institut Pasteur, rue Dr. Roux 25, 75724<br />
Paris Cedex 15, France. Phone: 33-(0)144-38-9119. Fax: 33-(0)145-68-<br />
8834. E-mail: prangish@pasteur.fr.<br />
† Supplemental material for this article may be found at http://jb<br />
.asm.org/.<br />
Published ahead of pr<strong>in</strong>t on 22 August 2008.<br />
6837<br />
At present, viruses of <strong>the</strong> family Rudiviridae are <strong>the</strong> most<br />
promis<strong>in</strong>g for detailed studies because <strong>the</strong>y can be obta<strong>in</strong>ed <strong>in</strong><br />
reasonable yields, and <strong>the</strong>re are already some <strong>in</strong>sights <strong>in</strong>to<br />
<strong>the</strong>ir mechanisms of replication, transcriptional regulation,<br />
and host cell adaptation (4, 12, 13, 20, 21). To date, three<br />
rudiviruses have been characterized, all from <strong>the</strong> order Sulfolobales:<br />
<strong>the</strong> closely related Sulfolobus rod-shaped virus 1<br />
(SIRV1), and SIRV2, isolated on Iceland, which <strong>in</strong>fect stra<strong>in</strong>s<br />
of Sulfolobus islandicus (20, 22), and Acidianus rod-shaped<br />
virus 1 (ARV1), isolated at Pozzuoli, Italy, which propagates <strong>in</strong><br />
Acidianus stra<strong>in</strong>s (34). Moreover, rudivirus-like morphotypes<br />
and partial rudiviral genome sequences have been detected <strong>in</strong><br />
environmental samples collected from both acidic and neutrophilic<br />
hot aquatic sites (27, 29, 32).<br />
All rudiviral genomes carry l<strong>in</strong>ear dsDNA genomes with<br />
long <strong>in</strong>verted term<strong>in</strong>al repeats (ITRs) end<strong>in</strong>g <strong>in</strong> covalently<br />
closed hairp<strong>in</strong> structures with 5-to-3 l<strong>in</strong>kages (4, 20). The<br />
term<strong>in</strong>al structure is important for replication, which presumably<br />
is <strong>in</strong>itiated by site-specific s<strong>in</strong>gle-strand nick<strong>in</strong>g<br />
with<strong>in</strong> <strong>the</strong> ITR, with <strong>the</strong> subsequent formation of head-tohead<br />
and tail-to-tail <strong>in</strong>termediates, and <strong>the</strong> conversion of<br />
genomic concatemers <strong>in</strong>to monomers by a virus-encoded<br />
Holliday junction resolvase (20). This basic replication<br />
mechanism appears to be similar to that used by <strong>the</strong> eukaryal<br />
poxviruses, Chlorella virus and African sw<strong>in</strong>e fever<br />
virus, although <strong>the</strong>re is no clear similarity between <strong>the</strong> se-<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
6838 VESTERGAARD ET AL. J. BACTERIOL.<br />
quences of <strong>the</strong> implicated archaeal and eukaryal prote<strong>in</strong>s<br />
(20, 25).<br />
The transcriptional patterns of rudiviruses SIRV1 and<br />
SIRV2 are relatively simple, with few temporal expression<br />
differences. An exception is <strong>the</strong> gene encod<strong>in</strong>g <strong>the</strong> major<br />
structural prote<strong>in</strong> that b<strong>in</strong>ds to DNA and, at an early stage<br />
of <strong>in</strong>fection, is expressed as a polycistronic mRNA but appears<br />
as a s<strong>in</strong>gle gene transcript close to <strong>the</strong> eclipse period (12). It<br />
has also been shown that rudiviral transcription can be activated<br />
by a Sulfolobus host-encoded prote<strong>in</strong>, Sta1, that <strong>in</strong>teracts<br />
specifically with TATA-like promoter motifs <strong>in</strong> <strong>the</strong> viral genome<br />
(13).<br />
For SIRV1, a detailed study of <strong>the</strong> mechanism of adaptation<br />
to foreign hosts was conducted. Upon passage of <strong>the</strong> virus<br />
through closely related S. islandicus stra<strong>in</strong>s, complex changes<br />
were detected that were concentrated with<strong>in</strong> six genomic regions<br />
(21, 22). These changes <strong>in</strong>cluded <strong>in</strong>sertions, deletions,<br />
gene duplications, <strong>in</strong>versions, and transpositions, as well as<br />
changes <strong>in</strong> gene sizes that often <strong>in</strong>volved <strong>the</strong> <strong>in</strong>sertion or deletion<br />
of what appeared to be “12-bp elements.” It was concluded<br />
that <strong>the</strong> virus generated a complex mixture of variants,<br />
one or more of which were preferentially propagated when <strong>the</strong><br />
virus entered a new host (21).<br />
Here we describe a novel rudivirus, Stygiolobus rod-shaped<br />
virus (SRV), isolated from <strong>the</strong> Azores, Portugal, a location<br />
geographically distant from <strong>the</strong> locations of <strong>the</strong> o<strong>the</strong>r characterized<br />
rudiviruses (20, 34). SRV shows sufficient differences<br />
from <strong>the</strong> o<strong>the</strong>r rudiviruses, both morphologically and genomically,<br />
to warrant its classification as a novel species. The structural<br />
and genomic properties of <strong>the</strong> rudiviruses are compared<br />
and contrasted, and new data on <strong>the</strong> conserved virion structural<br />
prote<strong>in</strong>s are presented. Different rudiviruses were selected<br />
for <strong>the</strong>se studies on <strong>the</strong> basis of <strong>the</strong> virion or prote<strong>in</strong><br />
yields that were obta<strong>in</strong>ed. Moreover, matches between <strong>the</strong><br />
spacer regions of <strong>the</strong> crenarchaeal chromosomal <strong>CRISPR</strong> repeat<br />
clusters, which have been implicated <strong>in</strong> a viral defense<br />
<strong>system</strong> (18) <strong>in</strong>volv<strong>in</strong>g processed RNA transcribed from one<br />
DNA strand (reviewed <strong>in</strong> references 16 and 17), and <strong>the</strong> rudiviral<br />
genomes are analyzed and <strong>the</strong>ir significance, and possible<br />
relationships to <strong>the</strong> 12-bp <strong>in</strong>dels, are considered.<br />
MATERIALS AND METHODS<br />
Enrichment culture, isolation of viral hosts, and virus purification. An environmental<br />
sample was taken from a hot acidic spr<strong>in</strong>g (93°C, pH 2) <strong>in</strong> <strong>the</strong> Furnas<br />
Bas<strong>in</strong> on Saõ Miguel Island, <strong>the</strong> Azores, Portugal. The aerobic enrichment<br />
culture was established from <strong>the</strong> environmental sample and ma<strong>in</strong>ta<strong>in</strong>ed at 80°C<br />
under conditions described previously for cultivation of members of <strong>the</strong> Sulfolobales<br />
(35). S<strong>in</strong>gle stra<strong>in</strong>s were isolated by plat<strong>in</strong>g on Gelrite (Kelco, San<br />
Diego, CA) conta<strong>in</strong><strong>in</strong>g colloidal sulfur (35) and grown <strong>in</strong> <strong>the</strong> medium of <strong>the</strong><br />
enrichment culture. Cell-free supernatants of cultures were analyzed by transmission<br />
electron microscopy for <strong>the</strong> presence of virus particles.<br />
SRV was isolated from <strong>the</strong> growth culture of its host stra<strong>in</strong> Stygiolobus sp.,<br />
which was colony purified as described above. After cells were grown to <strong>the</strong> late<br />
exponential phase and harvested by low-speed centrifugation (Sorvall GS3 rotor)<br />
(4,500 rpm), virions were precipitated from <strong>the</strong> supernatant by add<strong>in</strong>g NaCl (1<br />
M) and polyethylene glycol 6000 (10% [wt/vol]) and ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g <strong>the</strong> mixture at<br />
4°C overnight. They were purified fur<strong>the</strong>r by CsCl gradient centrifugation (34).<br />
Transmission electron microscopy. Samples were deposited on carbon-coated<br />
copper grids, negatively sta<strong>in</strong>ed with 2% uranyl acetate (pH 4.5), and exam<strong>in</strong>ed<br />
<strong>in</strong> a CM12 transmission electron microscope (FEI, E<strong>in</strong>dhoven, The Ne<strong>the</strong>rlands)<br />
operated at 120 keV. The magnification was calibrated us<strong>in</strong>g catalase crystals<br />
negatively sta<strong>in</strong>ed with uranyl acetate (28). Images were digitally recorded us<strong>in</strong>g<br />
a slow-scan charge-coupled-device camera connected to a PC runn<strong>in</strong>g TVIPS<br />
software (TVIPS GmbH, Gaut<strong>in</strong>g, Germany). To some samples, 0.1% sodium<br />
dodecyl sulfate (SDS) was added, and those samples were ma<strong>in</strong>ta<strong>in</strong>ed at 22°C for<br />
30 m<strong>in</strong> <strong>in</strong> order to study <strong>the</strong> stability of <strong>the</strong> virion particles. Electron tomography<br />
of <strong>in</strong>tact, negatively sta<strong>in</strong>ed virions was performed as described previously (10,<br />
26). Visualization of <strong>the</strong> three-dimensional (3D) data was performed us<strong>in</strong>g<br />
Amira software (Visage Imag<strong>in</strong>g, Fürth, Germany).<br />
Prote<strong>in</strong> analyses. Prote<strong>in</strong>s of SRV were separated <strong>in</strong> 13.5% SDS–polyacrylamide<br />
gels (14) and sta<strong>in</strong>ed with Coomassie brilliant blue R-250 (Serva,<br />
Heidelberg, Germany). N-term<strong>in</strong>al prote<strong>in</strong> sequences were determ<strong>in</strong>ed by<br />
Edman degradation us<strong>in</strong>g a Procise 492 prote<strong>in</strong> sequencer (Applied Bio<strong>system</strong>s,<br />
Foster City, CA).<br />
SIRV2 prote<strong>in</strong>s were separated <strong>in</strong> 4 to 12% SDS–polyacrylamide NuPAGE<br />
gradient gels by <strong>the</strong> use of MES (morphol<strong>in</strong>eethanesulfonic acid) buffer (both<br />
from Invitrogen, Paisley, United K<strong>in</strong>gdom). The gels were sta<strong>in</strong>ed with Sypro<br />
Ruby (Invitrogen). Prote<strong>in</strong> bands were analyzed by peptide mass f<strong>in</strong>gerpr<strong>in</strong>t<strong>in</strong>g<br />
with matrix-assisted laser desorption ionization–time of flight mass spectrometry<br />
us<strong>in</strong>g a Voyager DE-STR biospectrometry workstation (Applied Bio<strong>system</strong>s,<br />
Fram<strong>in</strong>gham, MA) as described earlier (26). The analysis was performed <strong>in</strong><br />
conjunction with <strong>the</strong> proteomic platform at <strong>the</strong> Pasteur Institute.<br />
Clon<strong>in</strong>g and heterologous expression of ARV1-ORF134b and purification of<br />
<strong>the</strong> recomb<strong>in</strong>ant prote<strong>in</strong> and its self-assembly. ARV1-ORF134b was amplified<br />
from purified viral DNA with primers ARV1ORF134F (GGAATTCCATATG<br />
ATGGCGAAAGGACACACACC) and ARV1ORF134R (GGAATTCTCGA<br />
GACTTACGTATCCGTTAGGAC). The PCR product was purified (PCR purification<br />
kit; Roche, Mannheim, Germany) and cloned <strong>in</strong>to pET30a expression<br />
vector (Novagen, Madison, WI) between restriction sites for EcoRI and XbaI.<br />
The prote<strong>in</strong> was expressed overnight at 20°C <strong>in</strong> <strong>the</strong> Escherichia coli<br />
Rosetta(DE3)pLysS stra<strong>in</strong>. Prote<strong>in</strong> expression was controlled by SDS-polyacrylamide<br />
gel electrophoresis analysis and by perform<strong>in</strong>g a Western blot analysis<br />
us<strong>in</strong>g anti-His-tag-specific antibodies (Novagen). The native prote<strong>in</strong> was purified<br />
on a Ni 2 -nitrilotriacetic acid (Ni 2 -NTA)-agarose column (Novagen) with elution<br />
buffers conta<strong>in</strong><strong>in</strong>g 50 to 500 mM imidazole. The accuracy of its sequence was<br />
confirmed. Self-assembly of <strong>the</strong> recomb<strong>in</strong>ant prote<strong>in</strong> <strong>in</strong>to filamentous structures<br />
was performed at 75°C and pH 3.5 and observed by electron microscopy.<br />
Preparation of cellular and viral DNA and DNA sequenc<strong>in</strong>g. DNA was extracted<br />
from Stygiolobus azoricus cells as described previously (2), and <strong>the</strong> 16S<br />
rRNA gene was amplified by PCR us<strong>in</strong>g primers 8aF and 1512 uR (6) and<br />
sequenced.<br />
Viral DNA was obta<strong>in</strong>ed by disrupt<strong>in</strong>g SRV particles with 1% SDS for 1hat<br />
room temperature and extraction with phenol-chloroform (9). A shotgun library<br />
was prepared by sonicat<strong>in</strong>g viral DNA to generate fragments of 2 to 4 kb and<br />
clon<strong>in</strong>g <strong>the</strong>se <strong>in</strong>to <strong>the</strong> SmaI site of <strong>the</strong> pUC18 vector. DNA was purified from<br />
s<strong>in</strong>gle colonies by <strong>the</strong> use of a Biorobot 8000 workstation (Qiagen, Westburg,<br />
Germany) and sequenced <strong>in</strong> MegaBACE 1000 sequenators (Amersham Biotech,<br />
Amersham, United K<strong>in</strong>gdom). The viral sequence was assembled us<strong>in</strong>g Sequencher<br />
4.2 software (Gene Code, Ann Arbor, MI). PCR primers for gap<br />
clos<strong>in</strong>g and resolv<strong>in</strong>g sequence ambiguities were designed us<strong>in</strong>g Primers for Mac,<br />
version 1.0. Sequence alignments were obta<strong>in</strong>ed us<strong>in</strong>g MUSCLE software (7).<br />
Open read<strong>in</strong>g frames (ORFs) were def<strong>in</strong>ed with <strong>the</strong> help of ARTEMIS software<br />
(30) and <strong>in</strong>vestigated <strong>in</strong> searches us<strong>in</strong>g <strong>the</strong> EMBL and GenBank (1), 3D-Jury (8),<br />
and SMART (15) databases. Genome maps were generated and compared us<strong>in</strong>g<br />
Mutagen software, version 4.0 (5).<br />
Bio<strong>in</strong>formatical match<strong>in</strong>g of crenarchaeal <strong>CRISPR</strong> spacers to rudiviral genomes.<br />
<strong>CRISPR</strong>s were predicted for each of <strong>the</strong> 14 publicly available crenarchaeal<br />
genomes <strong>in</strong> GenBank (NC_000854 [Aeropyrum pernix K1], NC_002754<br />
[Sulfolobus solfataricus P2], NC_003106 [Sulfolobus tokodaii stra<strong>in</strong> 7],<br />
NC_003364 [Pyrobaculum aerophilum stra<strong>in</strong> IM2], NC_007181 [Sulfolobus acidocaldarius<br />
DSM 639], NC_008698 [Thermofilum pendens Hrk5], NC_008701<br />
[Pyrobaculum islandicum DSM 4184], NC_008818 [Hyper<strong>the</strong>rmus butylicus DSM<br />
5456], NC_009033 [Staphylo<strong>the</strong>rmus mar<strong>in</strong>us F1], NC_009073 [Pyrobaculum<br />
calidifontis JCM 11548], NC_009376 [Pyrobaculum arsenaticum DSM 13514],<br />
NC_009440 [Metallosphaera sedula DSM 5348], NC_009676 [Cenarchaeum symbiosum],<br />
and NC_009776 [Ignicoccus hospitalis KIN4/I]). In addition, <strong>the</strong> six<br />
sequenced repeat clusters from Sulfolobus solfataricus P1 (16) were added to <strong>the</strong><br />
data set as well as <strong>CRISPR</strong>s from five <strong>in</strong>complete Sulfolobus islandicus genomes<br />
publicly available through <strong>the</strong> Jo<strong>in</strong>t Genome Institute (http://genome.jgi.doe.gov<br />
/mic_asmb.html) and unpublished genome sequences of Sulfolobus islandicus<br />
HVE10/4 and Acidianus brierleyi from <strong>the</strong> Copenhagen laboratory. The repeat<br />
cluster sequences were found us<strong>in</strong>g publicly available software (3, 7).<br />
All predictions were curated manually. The orientation of each repeat cluster<br />
was <strong>in</strong>ferred from <strong>the</strong> repeat sequence and by locat<strong>in</strong>g <strong>the</strong> low-complexity flank<strong>in</strong>g<br />
sequence that generally resides immediately upstream from <strong>the</strong> cluster and<br />
conta<strong>in</strong>s <strong>the</strong> transcriptional leader (16). All unique spacer sequences of <strong>the</strong><br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6839<br />
FIG. 1. Electron micrographs of SRV virions negatively sta<strong>in</strong>ed with 3% uranyl acetate. (A) A full virion particle, with a discont<strong>in</strong>uous central<br />
l<strong>in</strong>e along <strong>the</strong> virion. (B) Six virions attached to liposome-like structures. (C) Enlargement of a portion of panel B display<strong>in</strong>g <strong>the</strong> term<strong>in</strong>al fibers.<br />
(D to H) Electron tomography images of an SRV virion. (D) Horizontal x-y slice (0.7 nm) show<strong>in</strong>g <strong>the</strong> accumulated sta<strong>in</strong> <strong>in</strong> <strong>the</strong> central part of<br />
<strong>the</strong> virion (white arrow). (E) Vertical y-z slice (0.7 nm) through <strong>the</strong> 3D data set of <strong>the</strong> reconstructed part of an SRV particle. (F) Visualization<br />
of <strong>the</strong> 3D data set us<strong>in</strong>g Amira software. (G and H) Vertical x-z slice (0.7 nm) through <strong>the</strong> tomogram show<strong>in</strong>g that <strong>the</strong> virion particles are<br />
embedded <strong>in</strong> negative sta<strong>in</strong> and that accumulated sta<strong>in</strong> visible <strong>in</strong> panel D is absent from <strong>the</strong> plug (black arrows). Bars, 200 nm (A and B); 50 nm<br />
(C); 20 nm (D, E, G, and H).<br />
repeat clusters, correspond<strong>in</strong>g to <strong>the</strong> processed spacer transcript sequence (16),<br />
were aligned to <strong>the</strong> complete nucleotide sequences on each strand of all four<br />
rudiviral genomes (SRV [accession no. FM164764], SIRV1 [AJ414696], SIRV2<br />
[AJ344259], and ARV1 [AJ875026]) by use of Paralign, an MMX-optimized<br />
implementation of <strong>the</strong> Smith-Watermann algorithm (31). Moreover, assum<strong>in</strong>g<br />
that <strong>the</strong> spacer DNA can be <strong>in</strong>corporated <strong>in</strong>to <strong>the</strong> oriented <strong>CRISPR</strong>s <strong>in</strong> ei<strong>the</strong>r<br />
direction, we also translated <strong>the</strong> two strands of <strong>the</strong> spacer DNA <strong>in</strong>to all <strong>the</strong><br />
read<strong>in</strong>g frames, yield<strong>in</strong>g six am<strong>in</strong>o acid sequences per spacer. Read<strong>in</strong>g frames<br />
conta<strong>in</strong><strong>in</strong>g stop codons (ca. 50%) were omitted to make <strong>the</strong> subsequent search<br />
more specific. Each translation was aligned aga<strong>in</strong>st <strong>the</strong> am<strong>in</strong>o acid sequences of<br />
all <strong>the</strong> annotated ORFs <strong>in</strong> each of <strong>the</strong> four rudiviral genomes. Significant e-value<br />
cutoffs were determ<strong>in</strong>ed for both <strong>the</strong> nucleotide and am<strong>in</strong>o acid sequence<br />
searches us<strong>in</strong>g <strong>the</strong> genome sequence of Saccharomyces cerevisiae as a negative<br />
control (data not shown).<br />
RESULTS<br />
SRV isolation and structure. The virus-produc<strong>in</strong>g stra<strong>in</strong> was<br />
colony purified from an enrichment culture established from a<br />
sample collected from an acidic hot spr<strong>in</strong>g <strong>in</strong> <strong>the</strong> Azores (see<br />
Rudivirus Orig<strong>in</strong><br />
Virion<br />
length (nm)<br />
TABLE 1. Properties of <strong>the</strong> rudiviruses<br />
Genome size<br />
(bp)<br />
Materials and Methods). Its 16S rRNA sequence represented<br />
<strong>the</strong> genus Stygiolobus of <strong>the</strong> Sulfolobales crenarchaeal order<br />
and was closely related to that of Stygiolobus azoricus. However,<br />
it differs from S. azoricus, <strong>the</strong> type species of <strong>the</strong> genus, <strong>in</strong><br />
its capacity to grow aerobically, and a description of <strong>the</strong> new<br />
species is <strong>in</strong> preparation. The virus particles produced constituted<br />
flexible rods 702 ( 50) by 22 ( 3) nm <strong>in</strong> size, with three<br />
short fibers at each term<strong>in</strong>us (Fig. 1A to C; Table 1). A Fourier<br />
analysis of <strong>the</strong> virion (not shown) revealed <strong>the</strong> presence of<br />
regular features with a periodicity of (4.2 nm) 1 , which probably<br />
reflect a helical subunit arrangement. This feature is also<br />
seen <strong>in</strong> <strong>the</strong> tomographic data set (Fig. 1D to H), which revealed<br />
more structural details. The helical arrangement <strong>in</strong> <strong>the</strong><br />
virion core occurs <strong>in</strong> two different configurations. In <strong>the</strong> central<br />
region, a zigzag structure with dark contrast, probably aris<strong>in</strong>g<br />
from uranyl acetate sta<strong>in</strong><strong>in</strong>g, is surrounded by a prote<strong>in</strong> shell<br />
(Fig. 1D and E). In contrast, <strong>in</strong> <strong>the</strong> term<strong>in</strong>al plug, which is<br />
Total no. of<br />
ORFs<br />
GC (%)<br />
ITR length<br />
(bp)<br />
SRV Azores 702 28,097 37 29.3 1,030<br />
ARV1 Pozzuoli 610 24,655 41 39.1 1,365 34<br />
SIRV1 Iceland 830 32,308 45 25.3 2,032 20<br />
SIRV2 Iceland 900 35,498 54 25.2 1,626 20<br />
Reference<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
6840 VESTERGAARD ET AL. J. BACTERIOL.<br />
FIG. 2. Electron micrograph of a portion of an SRV virion after<br />
treatment with 0.1% SDS for 30 m<strong>in</strong> (see Materials and Methods).<br />
White arrows <strong>in</strong>dicate DNA or DNA-prote<strong>in</strong> fibers lack<strong>in</strong>g <strong>the</strong> prote<strong>in</strong><br />
core. Bar, 100 nm.<br />
about 50 nm <strong>in</strong> length, a helically arranged prote<strong>in</strong> mass, with<br />
no obvious uranyl acetate <strong>in</strong>clusions, is seen (Fig. 1D to F).<br />
The three term<strong>in</strong>al fibers, anchored <strong>in</strong> <strong>the</strong> plug-like structure,<br />
appear to be built up of multiple subunits ordered <strong>in</strong> a l<strong>in</strong>ear<br />
array (Fig. 1D). The side view of <strong>the</strong> reconstructed virion<br />
particle (Fig. 1E), as well as cross-sections of <strong>the</strong> negatively<br />
sta<strong>in</strong>ed virions obta<strong>in</strong>ed from <strong>the</strong> tomograms (Fig. 1G and H),<br />
shows that <strong>the</strong> virion particles are embedded <strong>in</strong> negative sta<strong>in</strong><br />
(Fig. 1G and H) and partially collapsed due to sta<strong>in</strong><strong>in</strong>g and air<br />
dry<strong>in</strong>g; <strong>the</strong> height of <strong>the</strong> particles was about half of <strong>the</strong> apparent<br />
diameter. Never<strong>the</strong>less, <strong>the</strong> accumulated central sta<strong>in</strong> is<br />
clearly visible <strong>in</strong> <strong>the</strong> cross-section (Fig. 1G) of <strong>the</strong> central part<br />
of <strong>the</strong> virion (Fig. 1D), while this feature is absent from <strong>the</strong><br />
plug (Fig. 1G and H). The rod-shaped morphology of SRV,<br />
with a regular helical core and tail fibers, is characteristic of<br />
rudiviruses.<br />
To <strong>in</strong>vestigate fur<strong>the</strong>r <strong>the</strong> f<strong>in</strong>e structure of <strong>the</strong> virion, virion<br />
particles were <strong>in</strong>cubated <strong>in</strong> buffer conta<strong>in</strong><strong>in</strong>g 0.1% SDS for 30<br />
m<strong>in</strong> at 22°C. Most of <strong>the</strong> virion rema<strong>in</strong>ed undisturbed, with<br />
<strong>the</strong> particles show<strong>in</strong>g <strong>the</strong> same diameter as native virions and <strong>the</strong><br />
densely sta<strong>in</strong>ed, helical core. However, <strong>in</strong> local regions <strong>the</strong><br />
prote<strong>in</strong> shell had dissociated (Fig. 2) and a f<strong>in</strong>e fiber with a<br />
diameter of 3 to 4 nm that constituted ei<strong>the</strong>r naked DNA or a<br />
DNA-prote<strong>in</strong> complex was visible.<br />
Self-assembly of <strong>the</strong> major coat prote<strong>in</strong>. The major rudiviral<br />
structural prote<strong>in</strong> is highly conserved <strong>in</strong> sequence and is glycosylated<br />
(20, 22, 32a, 34). In order to study its possible selfassembly<br />
properties, <strong>the</strong> ARV1 prote<strong>in</strong> (ORF134b [34]) was<br />
expressed heterologously <strong>in</strong> E. coli (see Materials and Methods)<br />
and a His-tagged prote<strong>in</strong> was purified to homogeneity on<br />
an Ni 2 -NTA-agarose column. The prote<strong>in</strong> was shown by<br />
transmission electron microscopy to self-assemble to produce<br />
filamentous structures of uniform widths and different lengths<br />
(Fig. 3). The optimal conditions for <strong>the</strong> assembly, 75°C and pH<br />
3, were close to those of <strong>the</strong> natural environment, and no<br />
additional energy source was required for this process.<br />
The transmission electron microscopy analysis revealed that <strong>the</strong><br />
filaments had structural parameters similar to those of <strong>the</strong><br />
native virions, with a diameter of 21 ( 3) nm and a periodicity<br />
of (4.2 nm) 1 . Thus, <strong>the</strong> data suggest that <strong>the</strong> s<strong>in</strong>gle major coat<br />
prote<strong>in</strong> alone can generate <strong>the</strong> body of <strong>the</strong> virion.<br />
M<strong>in</strong>or rudiviral virion prote<strong>in</strong>s. To date, <strong>the</strong> major coat<br />
prote<strong>in</strong> is <strong>the</strong> only rudiviral structural prote<strong>in</strong> to have been<br />
characterized. Given <strong>the</strong> closely similar structures of <strong>the</strong> different<br />
rudiviruses, we attempted to identify m<strong>in</strong>or structural<br />
prote<strong>in</strong>s for <strong>the</strong> SIRV2 virus, which can be produced <strong>in</strong> high<br />
yields. Prote<strong>in</strong> components of SIRV2 virions, separated on a<br />
polyacrylamide gel, yielded six dist<strong>in</strong>ct major bands (Fig. 4),<br />
and all except D2, which is <strong>the</strong> strongest band and corresponds<br />
to ORF134 (gp26), were analyzed by mass spectrometry. Their<br />
identities were as follows: band A conta<strong>in</strong>ed ORF1070 (gp38),<br />
band B conta<strong>in</strong>ed ORF488 (gp33), and band C conta<strong>in</strong>ed<br />
ORF564 (gp39), while bands D1 and D3 both conta<strong>in</strong>ed<br />
ORF134 (gp26), probably <strong>in</strong> a glycosylated or, <strong>in</strong> <strong>the</strong> case of<br />
D3, a proteolytically degraded form. Thus, three additional<br />
SIRV2 structural prote<strong>in</strong>s were identified, each highly conserved<br />
<strong>in</strong> sequence <strong>in</strong> all rudiviruses (Table 2).<br />
SRV genome content. A shotgun library of <strong>the</strong> viral genome<br />
was prepared, sequenced, and assembled (see Materials and<br />
Methods) to yield an approximately 10-fold coverage of a<br />
26-kb contig. S<strong>in</strong>ce 1 to 2 kb of term<strong>in</strong>al sequence is always<br />
absent from shotgun libraries of l<strong>in</strong>ear viral genomes, <strong>the</strong>se<br />
additional sequences were generated by primer walk<strong>in</strong>g us<strong>in</strong>g<br />
viral DNA, or us<strong>in</strong>g PCR products obta<strong>in</strong>ed <strong>the</strong>refrom, until<br />
subsequent rounds of walk<strong>in</strong>g yielded no fur<strong>the</strong>r sequence.<br />
FIG. 3. Electron micrograph images of <strong>the</strong> self-assembled major coat prote<strong>in</strong> of ORF134 from ARV1 after negative sta<strong>in</strong><strong>in</strong>g with 3% uranyl<br />
acetate. Bar, 100 nm.<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6841<br />
FIG. 4. SIRV2 virion prote<strong>in</strong>s separated by SDS-polyacrylamide<br />
gel electrophoresis and sta<strong>in</strong>ed with Sypro Ruby. Molecular masses of<br />
prote<strong>in</strong> standards are <strong>in</strong>dicated <strong>in</strong> kilodaltons on <strong>the</strong> left.<br />
The total sequence obta<strong>in</strong>ed was 28,096 bp, with a GC content<br />
of 29% and an ITR of about 1,030 bp (Table 1). An EcoRI<br />
restriction digest yielded fragments consistent with <strong>the</strong> genome<br />
size (data not shown).<br />
Thirty-seven ORFs were predicted for which start codons<br />
were assigned on <strong>the</strong> basis of <strong>the</strong> upstream locations of TATAlike<br />
and transcription factor B-responsive element (BRE) promoter<br />
motifs and/or Sh<strong>in</strong>e-Dalgarno motifs. Details of <strong>the</strong><br />
putative genes and operon structures are presented <strong>in</strong> Table S1<br />
<strong>in</strong> <strong>the</strong> supplemental material, and a comparative genome map<br />
of SRV and rudiviruses SIRV1 and ARV1 is presented <strong>in</strong> Fig.<br />
5; <strong>the</strong> genome map of SIRV2, which is closely similar to that of<br />
SIRV1, is not <strong>in</strong>cluded (12, 20). SRV differs from <strong>the</strong> o<strong>the</strong>r<br />
rudiviruses <strong>in</strong> that fewer ORFs are organized <strong>in</strong> operons, and<br />
it has a lower level of gene order conservation (Fig. 5). Moreover,<br />
whereas for <strong>the</strong> o<strong>the</strong>r rudiviruses TATA-like motifs are<br />
often directly preceded by a conserved GTC triplet (12, 20, 34),<br />
<strong>in</strong> SRV <strong>the</strong> ensu<strong>in</strong>g triplet sequence was GTA for 10 of <strong>the</strong> 30<br />
putative TATA-like motifs (see Table S1 <strong>in</strong> <strong>the</strong> supplemental<br />
material).<br />
Homologs of 17 SRV ORFs are present <strong>in</strong> all rudiviruses,<br />
and a fur<strong>the</strong>r 10 SRV ORFs are conserved <strong>in</strong> some rudiviruses.<br />
Each virus type carries a few genes which are unique, and <strong>the</strong>se<br />
are generally clustered near <strong>the</strong> ends of <strong>the</strong> l<strong>in</strong>ear genomes and<br />
yield no matches to genes <strong>in</strong> public sequence databases. In<br />
SRV, <strong>the</strong>se are ORF145, -116a, -109, -59, -108, -97b, and -92<br />
(left to right <strong>in</strong> Fig. 5). Although for SIRV1 and SIRV2 some<br />
of <strong>the</strong>se nonconserved ORFs have been shown to be transcribed<br />
(12), fur<strong>the</strong>r work is necessary to establish whe<strong>the</strong>r<br />
<strong>the</strong>y are all prote<strong>in</strong>-cod<strong>in</strong>g genes. Some of <strong>the</strong> prote<strong>in</strong>s carry<br />
predicted structural motifs, and putative functions could be<br />
assigned to some of <strong>the</strong> conserved ORFs on <strong>the</strong> basis of public<br />
database searches; most of <strong>the</strong>se are encoded <strong>in</strong> o<strong>the</strong>r crenarchaeal<br />
genomes (Table 2).<br />
The host-encoded transcriptional regulator Sta1, a w<strong>in</strong>ged<br />
helix-turn-helix prote<strong>in</strong>, was shown to b<strong>in</strong>d to some SIRV1<br />
promoters, <strong>in</strong>clud<strong>in</strong>g those of ORF134 and ORF399, and to<br />
enhance <strong>the</strong>ir transcription (13). A similar regulation may occur<br />
also for SRV, s<strong>in</strong>ce <strong>the</strong> promoter regions of <strong>the</strong> homologs<br />
of ORF134 and ORF399 conta<strong>in</strong> putative Sta1 b<strong>in</strong>d<strong>in</strong>g sites. In<br />
contrast, <strong>in</strong> ARV1 only <strong>the</strong> ORF134 homolog is present <strong>in</strong> an<br />
operon for which <strong>the</strong> first ORF is a putative transcriptional<br />
regulator, and its promoter does not carry Sta1 b<strong>in</strong>d<strong>in</strong>g motifs.<br />
Genomic features. Sequence heterogeneities and o<strong>the</strong>r exceptional<br />
properties were detected <strong>in</strong> <strong>the</strong> SRV genome and <strong>in</strong><br />
o<strong>the</strong>r rudiviral genomes that are described below.<br />
(i) ITRs. For SRV, <strong>the</strong> 1,030-bp ITR is perfect, except for a<br />
36-bp <strong>in</strong>sert at positions 799 to 834 at <strong>the</strong> left end and <strong>in</strong>verted<br />
tetramer sequences (AAAA [positions 425 to 428] and TTTT<br />
[positions 27672 to 27669]). It shows little sequence similarity<br />
to ITRs of <strong>the</strong> o<strong>the</strong>r rudiviruses, except for <strong>the</strong> 21-bp sequence<br />
(AATTTAGGAATTTAGGAATTT) located at <strong>the</strong> term<strong>in</strong>us<br />
that is predicted to be a Holliday junction resolvase b<strong>in</strong>d<strong>in</strong>g<br />
site occurr<strong>in</strong>g <strong>in</strong> all sequenced rudiviruses (34). The ITRs of<br />
SRV and SIRV1 and -2 carry four to five degenerate copies of<br />
this direct sequence repeat, while that of ARV1 carries multiple<br />
degenerate copies of o<strong>the</strong>r diverse repeats of similar<br />
sizes.<br />
(ii) Genome heterogeneity <strong>in</strong> SRV and 12-bp <strong>in</strong>dels. Sequence<br />
heterogeneities were detected <strong>in</strong> <strong>the</strong> SRV genome,<br />
with<strong>in</strong> <strong>the</strong> 10-fold sequence coverage, and mutations were localized<br />
to groups of subpopulations, <strong>in</strong>clud<strong>in</strong>g one 180-bp deletion<br />
between positions 11896 and 12077 <strong>in</strong> two out of six<br />
clones. Moreover, a 48-bp <strong>in</strong>sertion was observed <strong>in</strong> one variant<br />
(out of 18 clones) precisely at <strong>the</strong> C term<strong>in</strong>us of ORF533<br />
(position 20285) that generated a third copy of a 16-am<strong>in</strong>o-acid<br />
direct repeat. Some changes correspond<strong>in</strong>g to 12-bp <strong>in</strong>dels<br />
were also apparent <strong>in</strong> overlapp<strong>in</strong>g clones, and <strong>the</strong>y are <strong>in</strong>dicated<br />
<strong>in</strong> Table 3 toge<strong>the</strong>r with those observed earlier for<br />
SIRV1 (4, 20, 21). Moreover, sequence comparison of highly<br />
conserved ORFs present <strong>in</strong> <strong>the</strong> four rudiviral genomes revealed<br />
several additional 12-bp <strong>in</strong>dels. The locations of all <strong>the</strong><br />
identified <strong>in</strong>dels which occur <strong>in</strong> conserved rudiviral genes or<br />
sites correspond<strong>in</strong>g to SRV ORF75, -104, -138, -163, -168,<br />
-197, -199, -286, -294, -419, -440, -464, -533, and -1059 (Fig. 5)<br />
are <strong>in</strong>dicated <strong>in</strong> <strong>the</strong> SIRV1 genome map <strong>in</strong> Fig. 6.<br />
Rudiviral matches to <strong>CRISPR</strong>s. The availability of four separate<br />
rudiviral genome sequences provided a basis for analyz<strong>in</strong>g<br />
<strong>the</strong> frequency and distribution of <strong>the</strong> matches of <strong>CRISPR</strong><br />
spacer sequences to <strong>the</strong> viral genomes. Therefore, we analyzed<br />
<strong>the</strong> repeat clusters of each of <strong>the</strong> available crenarchaeal genomes<br />
<strong>in</strong> <strong>the</strong> public EMBL/GenBank and JGI sequence data-<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
6842 VESTERGAARD ET AL. J. BACTERIOL.<br />
SRV ORF category<br />
Rudiviral<br />
homolog(s)<br />
Structural prote<strong>in</strong>s<br />
ORF134 All Structural prote<strong>in</strong><br />
ORF464 All Structural prote<strong>in</strong><br />
ORF581 All Structural prote<strong>in</strong><br />
ORF1059 All Structural prote<strong>in</strong><br />
bases and <strong>in</strong> our own unpublished genomes (see Materials and<br />
Methods). Fourteen complete genomes and 8 partial genomes<br />
were analyzed. In total, 82 repeat clusters from complete genomes<br />
and 44 clusters, some <strong>in</strong>complete, from partial genomes<br />
TABLE 2. Rudiviral prote<strong>in</strong>s with predicted functions<br />
Predicted function or description Analysis tool E-value or score<br />
Transcriptional regulators<br />
ORF58 All RHH-1 SMART 2.0e-08 Many<br />
ORF95 None “W<strong>in</strong>ged helix” repressor DNA<br />
b<strong>in</strong>d<strong>in</strong>g doma<strong>in</strong><br />
3D-Jury 64.00 None<br />
O<strong>the</strong>r crenarchaeal<br />
virus(es)<br />
Translational regulator<br />
ORF294 SIRV1 and -2 tRNA-guan<strong>in</strong>e transglycosylase 3D-Jury 167.57 STSV1<br />
DNA replication<br />
ORF440 All RuvB Holliday junction helicase<br />
(Lon ATPase)<br />
3D-Jury 53.71 AFV1, AFV2<br />
ORF116c All Holliday junction resolvase<br />
(archaeal)<br />
SMART 2.4e-45<br />
ORF199 All Nuclease 3D-Jury 63.86 AFV1, SIFV<br />
DNA metabolism<br />
ORF168 SIRV1 and -2 dUTPase SMART 1.5e-12 STSV1<br />
ORF257 ARV1 Thymidylate synthase (Thy1) SMART 7.9e-46 STSV1<br />
ORF159 All S-adenosylmethion<strong>in</strong>e-dependent<br />
methyltransferase<br />
3D-Jury 73.67 SIFV<br />
Glycosylation<br />
ORF335 All Glycosyl transferase group 1 SMART 6.7e-09<br />
ORF355 All Glycosyl transferase SMART 5.1e-04<br />
O<strong>the</strong>r<br />
ORF419 SIRV1 and -2 11 transmembrane regions TMHMM<br />
yielded 4,283 spacer sequences. Subsequently, 278 sequences<br />
that are shared between S. solfataricus stra<strong>in</strong>s P1 and P2 (16)<br />
were omitted from <strong>the</strong> data set, yield<strong>in</strong>g a total of 4,005 spacer<br />
sequences.<br />
FIG. 5. Genome maps of SRV, SIRV1, and ARV1 show<strong>in</strong>g <strong>the</strong> predicted ORFs and <strong>the</strong> ITRs (bold l<strong>in</strong>es). SRV ORFs are identified by <strong>the</strong>ir<br />
am<strong>in</strong>o acid lengths. Homologous genes shared between <strong>the</strong> rudiviruses are color-coded. Genes above <strong>the</strong> horizontal l<strong>in</strong>e are transcribed from left<br />
to right, and those below <strong>the</strong> l<strong>in</strong>e are transcribed <strong>in</strong> <strong>the</strong> opposite direction. Predicted functions or structural characteristics of <strong>the</strong> gene products<br />
are <strong>in</strong>dicated as follows: sp, structural prote<strong>in</strong>; rhh, ribbon-helix-helix prote<strong>in</strong>; wh, w<strong>in</strong>ged helix prote<strong>in</strong>; tm, transmembrane; tgt, tRNA guan<strong>in</strong>e<br />
transglycosylase; hjh; Holliday junction helicase; hjr, Holliday junction resolvase; n, nuclease; du, dUTPase; ts, thymidylate synthase; sm,<br />
S-adenosylmethion<strong>in</strong>e-dependent methyltransferase; gt, glycosyl transferase.<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6843<br />
ORF or ITR<br />
TABLE 3. Occurrence of <strong>the</strong> 12-bp <strong>in</strong>dels <strong>in</strong> overlapp<strong>in</strong>g rudiviral clone libraries<br />
No. of 12-bp<br />
clones<br />
In <strong>the</strong> first analysis, each of <strong>the</strong> 4,005 spacer sequences was<br />
compared to <strong>the</strong> four rudiviral genomes at <strong>the</strong> nucleotide level.<br />
In total, 158 spacers yielded 268 rudiviral matches. The latter<br />
number exceeds <strong>the</strong> former because (i) some spacers match to<br />
more than one locus with<strong>in</strong> repeat sequences of a given virus<br />
and (ii) some spacers match to more than one virus. Second,<br />
<strong>the</strong> analysis was performed at <strong>the</strong> prote<strong>in</strong> level (see Materials<br />
and Methods). This analysis revealed 148 additional match<strong>in</strong>g<br />
spacers and a fur<strong>the</strong>r 427 rudiviral genome matches exclusively<br />
at <strong>the</strong> prote<strong>in</strong> level. (An additional 105 match<strong>in</strong>g spacer sequences<br />
from <strong>the</strong> latter analysis that overlapped, partially or<br />
completely, with 158 of those detected with<strong>in</strong> rudiviral ORFs<br />
at <strong>the</strong> nucleotide level were not counted.) Only 6 of <strong>the</strong> 14<br />
completed crenarchaeal genomes carried spacers yield<strong>in</strong>g<br />
matches to rudiviral genomes, and <strong>the</strong>y are listed, toge<strong>the</strong>r<br />
with <strong>the</strong> results for <strong>the</strong> partial genomes, <strong>in</strong> Table 4. These<br />
results re<strong>in</strong>forced <strong>the</strong> choice of criteria employed for determ<strong>in</strong><strong>in</strong>g<br />
<strong>the</strong> significance of sequence matches (see Materials<br />
and Methods).<br />
The locations of <strong>the</strong> spacer sequence matches are superimposed<br />
on <strong>the</strong> genome map of SIRV1 <strong>in</strong> Fig. 6. The matches are<br />
not evenly distributed along <strong>the</strong> genome; some genes have no<br />
matches, while o<strong>the</strong>rs carry up to 18. Although <strong>the</strong>re is no strict<br />
correlation between <strong>the</strong> level of gene sequence conservation<br />
and <strong>the</strong> number of match<strong>in</strong>g spacers, <strong>the</strong> five most conserved<br />
genes, ORF440, ORF1059, ORF134, ORF355, and ORF581,<br />
exhibit <strong>the</strong> highest number of matches (18, 15, 14, 14, and 13,<br />
respectively) (Fig. 3 and 6).<br />
DISCUSSION<br />
No. of 12-bp<br />
clones<br />
The morphological and genomic data for SRV and <strong>the</strong> o<strong>the</strong>r<br />
characterized rudiviruses are summarized <strong>in</strong> Table 1. The conservation<br />
of <strong>the</strong>ir morphologies and genomic properties contrasts<br />
with that of o<strong>the</strong>r crenarchaeal viruses and, <strong>in</strong> particular,<br />
Sequence<br />
with that of <strong>the</strong> filamentous lipothrixviruses, which exhibit a<br />
variety of surface, envelope, and tail structures and much more<br />
heterogeneous genomes (24).<br />
The virion length of SRV, 702 ( 50) nm, shows <strong>the</strong> same<br />
direct proportionality to genome size (28 kb) as those for <strong>the</strong><br />
o<strong>the</strong>r rudivirus virions, which range <strong>in</strong> length from 610 ( 50)<br />
nm (ARV1) to 900 ( 50) nm (SIRV2) (Table 1). A superhelical<br />
core, with a pitch of 4.3 nm and a width of 20 nm,<br />
term<strong>in</strong>ates <strong>in</strong> 45-nm-long nonhelical “plugs,” and it correlates<br />
with <strong>the</strong> <strong>in</strong>ternal structure observed earlier <strong>in</strong> electron micrographs<br />
of SIRV1 (22). In order to determ<strong>in</strong>e whe<strong>the</strong>r a s<strong>in</strong>gle<br />
superhelical DNA can span <strong>the</strong> SRV virion length, we applied<br />
<strong>the</strong> follow<strong>in</strong>g formulae to estimate <strong>the</strong> sizes and length of <strong>the</strong><br />
superhelical DNA:<br />
L turn p 2 c 2<br />
where L turn represents <strong>the</strong> arc length of a turn, p represents <strong>the</strong><br />
pitch, and c represents <strong>the</strong> cyl<strong>in</strong>der circumference, and<br />
L total t L turn<br />
Genome position<br />
(reference)<br />
SRV ORF58 5 1 AATTAAATTATG 26079–26068<br />
SRV ORF95 8 8 TTTTGAATTATG 7112–7101<br />
SIRV1 ORF335 7 3 AACATTCATTAA Variant (21)<br />
SIRV1 ORF562 1 4 ATACAAATTTCA Variant (21)<br />
SIRV1-ITR 10 29 TTTAGCAGTTCA (20)<br />
where t represents <strong>the</strong> number of turns and L total represents<br />
<strong>the</strong> arc length of entire helix.<br />
Calculations us<strong>in</strong>g structural parameters for B-form DNA<br />
yielded a genome size of 26 kbp without, and 30 kbp with,<br />
term<strong>in</strong>al “plugs”. The estimated width (20 nm) is an upperlimit<br />
estimate. A reciprocal calculation, with a 28-kbp genome,<br />
yields a diameter of 21.2 nm without, and 18.5 nm with, <strong>the</strong><br />
“plugs.” Given that <strong>the</strong> major rudiviral coat prote<strong>in</strong> is capable<br />
of self-assembly <strong>in</strong>to filamentous structures similar <strong>in</strong> width to<br />
<strong>the</strong> native virion (Fig. 4), it is likely that <strong>the</strong> rod-shaped body<br />
consists of a s<strong>in</strong>gle superhelical DNA embedded with<strong>in</strong> this<br />
filamentous prote<strong>in</strong> structure. Thus, <strong>the</strong> three newly identified<br />
m<strong>in</strong>or structural prote<strong>in</strong>s probably contribute to conserved<br />
term<strong>in</strong>al features of <strong>the</strong> virion; consistent with this, <strong>the</strong> largest<br />
FIG. 6. <strong>CRISPR</strong> spacer sequence matches for SIRV1 are superimposed on <strong>the</strong> SIRV1 genome map. Prote<strong>in</strong>-cod<strong>in</strong>g regions translated from<br />
left to right are shown above <strong>the</strong> l<strong>in</strong>e, and those translated from right to left are shown below <strong>the</strong> l<strong>in</strong>e. Highly conserved cod<strong>in</strong>g genes are presented<br />
<strong>in</strong> dark blue, while less-conserved or nonconserved genes are <strong>in</strong> light blue. The <strong>in</strong>verted term<strong>in</strong>al repeat is shaded <strong>in</strong> violet. Matches to spacers<br />
are shown as vertical l<strong>in</strong>es and are color-coded as <strong>in</strong>dicated. Matches to <strong>the</strong> upper DNA strand are placed above <strong>the</strong> genome, and those to <strong>the</strong><br />
lower strand are located below <strong>the</strong> genome. The red vertical l<strong>in</strong>es correspond to <strong>the</strong> nucleotide sequence matches, and <strong>the</strong> green vertical l<strong>in</strong>es<br />
correspond to match<strong>in</strong>g am<strong>in</strong>o acid sequences, after translation of <strong>the</strong> spacer sequences from both DNA strands. In total, <strong>the</strong>re were 106 matches<br />
to SIRV1 at <strong>the</strong> nucleotide level, some of <strong>the</strong>m occurr<strong>in</strong>g more than once, and an additional 127 matches to SIRV1 ORFs at <strong>the</strong> am<strong>in</strong>o acid level.<br />
The black arrowheads <strong>in</strong>dicate <strong>the</strong> positions of <strong>the</strong> 12-bp <strong>in</strong>dels that occur <strong>in</strong> one or more conserved rudiviral genes.<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
6844 VESTERGAARD ET AL. J. BACTERIOL.<br />
TABLE 4. Number of <strong>CRISPR</strong> spacer sequences from complete<br />
and partial crenarchaeal genomes which match<br />
rudiviral genomes a<br />
Stra<strong>in</strong><br />
No. of match<strong>in</strong>g<br />
sequences at <strong>the</strong><br />
<strong>in</strong>dicated level b<br />
Nucleotide<br />
Am<strong>in</strong>o<br />
acid<br />
Accession no.,<br />
reference,<br />
or source<br />
Complete genomes<br />
S. solfataricus P2 22 (14) 31 (18) NC_002754<br />
S. tokodaii 7 9 14 NC_003106<br />
M. sedula 5 15 NC_009440<br />
S. acidocaldarius 5 9 NC_007181<br />
S. mar<strong>in</strong>us F1 2 1 NC_009033<br />
H. butylicus 0 1 NC_008818<br />
Incomplete genomes<br />
S. solfataricus P1 20 (14) 30 (18) 16<br />
S. islandicus (5 stra<strong>in</strong>s) 39/12/4/2/1 26/7/2/2/0 See text<br />
S. islandicus HVE10/4 36 11 Unpublished<br />
A. brierleyi 15 14 Unpublished<br />
a All <strong>the</strong> acido<strong>the</strong>rmophilic organisms from <strong>the</strong> family Sulfolobaceae have<br />
spacers match<strong>in</strong>g those of <strong>the</strong> rudiviral genomes. However, <strong>the</strong> neutrophilic<br />
hyper<strong>the</strong>rmophiles S. mar<strong>in</strong>us and H. butylicus produced very few matches.<br />
Matches at <strong>the</strong> am<strong>in</strong>o acid sequence level that overlapped with those at <strong>the</strong><br />
nucleotide sequence level were excluded from <strong>the</strong> data.<br />
b Numbers <strong>in</strong> paren<strong>the</strong>ses <strong>in</strong> columns 2 and 3 <strong>in</strong>dicate <strong>the</strong> number of matches<br />
that arose from spacers shared by S. solfataricus stra<strong>in</strong>s P1 and P2 (16).<br />
structural prote<strong>in</strong> (correspond<strong>in</strong>g to SRV ORF1059) was localized<br />
with<strong>in</strong> <strong>the</strong> virion tail fibers of SIRV2 by study<strong>in</strong>g functional<br />
groups by <strong>the</strong> use of bioconjugation (Ste<strong>in</strong>metz et al.,<br />
submitted).<br />
We still have limited <strong>in</strong>sight <strong>in</strong>to functional roles of rudivirus-encoded<br />
prote<strong>in</strong>s (Table 2). The glycosyl transferases have<br />
been implicated <strong>in</strong> <strong>the</strong> glycosylation of <strong>the</strong> structural prote<strong>in</strong>s<br />
(34). Moreover, a few prote<strong>in</strong>s have been l<strong>in</strong>ked to viral replication.<br />
Two of <strong>the</strong>se, ORF440 and ORF199, lie with<strong>in</strong> an<br />
operon and are conserved <strong>in</strong> phylogenetically diverse lipothrixviruses<br />
(33). The former yielded significant matches to RuvB,<br />
<strong>the</strong> helicase facilitat<strong>in</strong>g branch migration dur<strong>in</strong>g Holliday junction<br />
resolution, while ORF199 yielded <strong>the</strong> best matches to<br />
nucleases, <strong>in</strong>clud<strong>in</strong>g Holliday junction resolvases (Table 2).<br />
Thus, <strong>the</strong>y are likely to facilitate rudiviral replication, which, <strong>in</strong><br />
SIRV1, <strong>in</strong>volves site-specific nick<strong>in</strong>g with<strong>in</strong> <strong>the</strong> ITR, formation<br />
of head-to-head and tail-to-tail <strong>in</strong>termediates, and conversion<br />
of genomic concatemers to monomers by a Holliday junction<br />
resolvase (ORF116c) (20). In addition, SRV encodes a<br />
dUTPase and a thymidylate synthase, both of which are <strong>in</strong>volved<br />
<strong>in</strong> thymidylate syn<strong>the</strong>sis, whereas <strong>the</strong> o<strong>the</strong>r rudiviruses<br />
encode only one of <strong>the</strong>se enzymes, both of which are considered<br />
helpful <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g of a low dUTP/dTTP ratio and thus<br />
<strong>in</strong> m<strong>in</strong>imiz<strong>in</strong>g detrimental effects of mis<strong>in</strong>corporat<strong>in</strong>g uracil<br />
<strong>in</strong>to DNA. Two putative transcriptional regulators have been<br />
identified, toge<strong>the</strong>r with <strong>the</strong> putative tRNA transglycosylase<br />
encoded by SRV, which has homologs <strong>in</strong> SIRV1 and -2 and <strong>in</strong><br />
o<strong>the</strong>r crenarchaeal viruses (Table 2) and is distantly related to<br />
a tRNA-guan<strong>in</strong>e transglycosylase implicated <strong>in</strong> archeos<strong>in</strong>e formation.<br />
The two approaches employed to analyze <strong>CRISPR</strong> spacers<br />
match<strong>in</strong>g <strong>the</strong> four rudiviral genomes demonstrated that about<br />
10% of <strong>the</strong> 3,042 unique acido<strong>the</strong>rmophile spacers yielded<br />
positive matches. Employ<strong>in</strong>g alignments at <strong>the</strong> am<strong>in</strong>o acid<br />
level considerably <strong>in</strong>creased <strong>the</strong> number of positive matches<br />
detected, because nucleotide sequences diverge more rapidly.<br />
Thus, <strong>the</strong> genomes of SRV and SIRV1 share almost no (4%)<br />
similarity at <strong>the</strong> DNA level, whereas most homologous prote<strong>in</strong>s<br />
show, on average, 47% sequence identity or similarity.<br />
When study<strong>in</strong>g <strong>the</strong> distribution of <strong>the</strong> spacer matches <strong>in</strong> <strong>the</strong><br />
rudiviral genomes, some trends are evident. First, <strong>the</strong>re is no<br />
significant bias with regard to <strong>the</strong> DNA strand carry<strong>in</strong>g <strong>the</strong><br />
match<strong>in</strong>g sequence. In SIRV1, for example, 122 matches occur<br />
on one strand and 111 on <strong>the</strong> o<strong>the</strong>r (Fig. 6). This is consistent<br />
with our assumption that <strong>the</strong> <strong>in</strong>corporation of viral or plasmid<br />
DNA <strong>in</strong>to <strong>the</strong> orientated <strong>CRISPR</strong>s is nondirectional. Second,<br />
<strong>in</strong> accordance with earlier analyses (16), for matches to cod<strong>in</strong>g<br />
regions, <strong>the</strong>re is no significant bias to matches occurr<strong>in</strong>g <strong>in</strong> a<br />
sense or antisense direction. Thus, for SIRV1, 39% of <strong>the</strong><br />
matches are <strong>in</strong> <strong>the</strong> sense direction whereas 54% are antisense—<strong>the</strong><br />
rema<strong>in</strong><strong>in</strong>g 7% constitute nucleotide matches to<br />
non-prote<strong>in</strong>-cod<strong>in</strong>g regions (Fig. 6). Third, when <strong>the</strong> latter<br />
nucleotide sequence-based matches are considered, <strong>the</strong> proportion<br />
of matches which occur <strong>in</strong> <strong>in</strong>tergenic regions, as opposed<br />
to those occurr<strong>in</strong>g <strong>in</strong> prote<strong>in</strong>-cod<strong>in</strong>g regions, is not significantly<br />
different from <strong>the</strong> overall cod<strong>in</strong>g percentage of <strong>the</strong><br />
virus. For SIRV1 19% of <strong>the</strong> nucleotide matches fall with<strong>in</strong><br />
<strong>in</strong>tergenic regions, whereas 20% of <strong>the</strong> genome is non-prote<strong>in</strong>cod<strong>in</strong>g.<br />
F<strong>in</strong>ally, some genes have many matches whereas o<strong>the</strong>rs<br />
have none at all. Five genes have 13 or more matches <strong>in</strong><br />
SIRV1; <strong>the</strong>se genes correspond to SRV ORF440, ORF1059,<br />
ORF134, ORF355, and ORF581. Apart from be<strong>in</strong>g conserved<br />
<strong>in</strong> each rudivirus, <strong>the</strong>ir gene products have important structural<br />
or functional roles (Table 2).<br />
The results pose an important question as to how <strong>the</strong> host<br />
dist<strong>in</strong>guishes between more important and less important<br />
genes when add<strong>in</strong>g <strong>the</strong> spacers to its <strong>CRISPR</strong>s. Possibly, although<br />
<strong>the</strong> de novo addition of spacers may well be an unbiased<br />
process with respect to both viral genome position and<br />
direction, <strong>the</strong> selective advantage provided by some spacers<br />
would result <strong>in</strong> a population be<strong>in</strong>g enriched <strong>in</strong> hosts with<br />
<strong>CRISPR</strong>s carry<strong>in</strong>g spacers target<strong>in</strong>g crucial viral genes.<br />
The 12-bp viral <strong>in</strong>dels were orig<strong>in</strong>ally shown to occur commonly<br />
<strong>in</strong> SIRV1 variants that arose as a result of passage of an<br />
SIRV1 isolate through different closely related S. islandicus<br />
stra<strong>in</strong>s from Iceland, and it was <strong>in</strong>ferred that this unusual<br />
activity reflected adaptation of <strong>the</strong> rudivirus to <strong>the</strong> different<br />
hosts (21). The positions of <strong>the</strong> 12-bp <strong>in</strong>dels that have been<br />
identified <strong>in</strong> conserved rudiviral prote<strong>in</strong> genes are shown toge<strong>the</strong>r<br />
with <strong>the</strong> <strong>CRISPR</strong> spacer matches on <strong>the</strong> SIRV1 genome<br />
map <strong>in</strong> Fig. 6. Many of <strong>the</strong> sites are very close or overlap.<br />
This raises <strong>the</strong> possibility that leng<strong>the</strong>n<strong>in</strong>g or shorten<strong>in</strong>g of<br />
conserved prote<strong>in</strong> genes by 12 bp could be a mechanism to<br />
overcome <strong>the</strong> host <strong>CRISPR</strong> defense <strong>system</strong>.<br />
We conclude that <strong>the</strong> rudiviruses are excellent models for<br />
study<strong>in</strong>g details of viral life cycles and virus-host <strong>in</strong>teractions <strong>in</strong><br />
crenarchaea. These viruses appear to be much more conserved<br />
<strong>in</strong> <strong>the</strong>ir morphologies and genomes than, for example, <strong>the</strong><br />
equally ubiquitous lipothrixviruses. Moreover, <strong>the</strong>y are relatively<br />
stably ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> <strong>the</strong>ir hosts and can be isolated <strong>in</strong><br />
reasonable yields for experimental studies.<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
VOL. 190, 2008 CRENARCHAEAL RUDIVIRUSES 6845<br />
ACKNOWLEDGMENTS<br />
We are grateful to Georg Fuchs for provid<strong>in</strong>g <strong>the</strong> environmental<br />
sample from Saõ Miguel Island, <strong>the</strong> Azores.<br />
The research <strong>in</strong> Copenhagen was supported by grants from <strong>the</strong><br />
Danish Natural Science Research Council, <strong>the</strong> Danish National Research<br />
Foundation, and Copenhagen University. The research <strong>in</strong> Paris<br />
was partly supported by grant NT05-2_41674 from Agence Nationale<br />
de Recherche (Programme Blanc).<br />
REFERENCES<br />
1. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller,<br />
and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation<br />
of prote<strong>in</strong> database search programs. Nucleic Acids Res. 25:3389–3402.<br />
2. Bettstetter, M., X. Peng, R. A. Garrett, and D. Prangishvili. 2003. AFV1, a<br />
novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic archaea of <strong>the</strong> genus Acidianus.<br />
Virology 315:68–79.<br />
3. Bland, C., T. L. Ramsey, F. Sabree, M. Lowe, K. Brown, N. C. Kyrpides, and<br />
P. Hugenholtz. 2007. <strong>CRISPR</strong> recognition tool (CRT): a tool for automatic<br />
detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />
Bio<strong>in</strong>formatics 8:209.<br />
4. Blum, H., W. Zillig, S. Mallok, H. Domdey, and D. Prangishvili. 2001. The<br />
genome of <strong>the</strong> archaeal virus SIRV1 has features <strong>in</strong> common with genomes<br />
of eukaryal viruses. Virology 281:6–9.<br />
5. Brügger, K., P. Redder, and M. Skovgaard. 2003. MUTAGEN: multi-user<br />
tool for annotat<strong>in</strong>g genomes. Bio<strong>in</strong>formatics 19:2480–2481.<br />
6. Eder, W., W. Ludwig, and R. Huber. 1999. Novel 16S rRNA gene sequences<br />
retrieved from highly sal<strong>in</strong>e br<strong>in</strong>e sediments of Kebrit Deep, Red Sea. Arch.<br />
Microbiol. 172:213–218.<br />
7. Edgar, R. C. 2004. MUSCLE: a multiple sequence alignment method with<br />
reduced time and space complexity. BMC Bio<strong>in</strong>formatics 5:113.<br />
8. G<strong>in</strong>alski, K., A. Elofsson, D. Fischer, and L. Rychlewski. 2003. 3D-Jury: a<br />
simple approach to improve prote<strong>in</strong> structure predictions. Bio<strong>in</strong>formatics<br />
19:1015–1018.<br />
9. Här<strong>in</strong>g, M., X. Peng, K. Brügger, R. Rachel, K. O. Stetter, R. A. Garrett, and<br />
D. Prangishvili. 2004. Morphology and genome organization of <strong>the</strong> virus<br />
PSV of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera Pyrobaculum and Thermoproteus:<br />
a novel virus family, <strong>the</strong> Globuloviridae. Virology 323:233–242.<br />
10. Här<strong>in</strong>g, M., G. Vestergaard, K. Brügger, R. Rachel, R. A. Garrett, and D.<br />
Prangishvili. 2005. Structure and genome organization of AFV2, a novel<br />
archaeal lipothrixvirus with unusual term<strong>in</strong>al and core structures. J. Bacteriol.<br />
187:3855–3858.<br />
11. Här<strong>in</strong>g, M., G. Vestergaard, R. Rachel, L. Chen, R. A. Garrett, and D.<br />
Prangishvili. 2005. Virology: <strong>in</strong>dependent virus development outside a host.<br />
Nature 436:1101–1102.<br />
12. Kessler, A., A. B. Br<strong>in</strong>kman, J. van der Oost, and D. Prangishvili. 2004.<br />
Transcription of <strong>the</strong> rod-shaped viruses SIRV1 and SIRV2 of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
archaeon Sulfolobus. J. Bacteriol. 186:7745–7753.<br />
13. Kessler, A., G. Sezonov, J. I. Guijarro, N. Desnoues, T. Rose, M. Delepierre,<br />
S. D. Bell, and D. Prangishvili. 2006. A novel archaeal regulatory prote<strong>in</strong>,<br />
Sta1, activates transcription from viral promoters. Nucleic Acids Res. 34:<br />
4837–4845.<br />
14. Laemmli, U. K. 1970. Cleavage of structural prote<strong>in</strong>s dur<strong>in</strong>g <strong>the</strong> assembly of<br />
<strong>the</strong> head of bacteriophage T4. Nature 227:680–685.<br />
15. Letunic, I., R. R. Copley, B. Pils, S. P<strong>in</strong>kert, J. Schultz, and P. Börk. 2006.<br />
SMART 5: doma<strong>in</strong>s <strong>in</strong> <strong>the</strong> context of genomes and networks. Nucleic Acids<br />
Res. 34:D257–D260.<br />
16. Lillestøl, R. K., P. Redder, R. A. Garrett, and K. Brügger. 2006. A putative<br />
viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2:59–72.<br />
17. Makarova, K. S., N. V. Grish<strong>in</strong>, S. A. Shabal<strong>in</strong>a, Y. I. Wolf, and E. V. Koon<strong>in</strong>.<br />
2006. A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes:<br />
computational analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />
analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms of action.<br />
Biol. Direct 1:7.<br />
18. Mojica, F. J., C. Diez-Villasenor, J. Garcia-Mart<strong>in</strong>ez, and E. Soria. 2005.<br />
Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats derive from<br />
foreign genetic elements. J. Mol. Evol. 60:174–182.<br />
19. Ortmann, A. C., B. Wiedenheft, T. Douglas, and M. Young. 2006. Hot<br />
crenarchaeal viruses reveal deep evolutionary connections. Nat. Rev. Microbiol.<br />
4:520–528.<br />
20. Peng, X., H. Blum, Q. She, S. Mallok, K. Brügger, R. A. Garrett, W. Zillig,<br />
and D. Prangishvili. 2001. Sequences and replication of genomes of <strong>the</strong><br />
archaeal rudiviruses SIRV1 and SIRV2: relationships to <strong>the</strong> archaeal lipothrixvirus<br />
SIFV and some eukaryal viruses. Virology 291:226–234.<br />
21. Peng, X., A. Kessler, H. Phan, R. A. Garrett, and D. Prangishvili. 2004.<br />
Multiple variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and<br />
a novel mechanism of genomic variation. Mol. Microbiol. 54:366–375.<br />
22. Prangishvili, D., H. P. Arnold, D. Gotz, U. Ziese, I. Holz, J. K. Kristjansson,<br />
and W. Zillig. 1999. A novel virus family, <strong>the</strong> Rudiviridae: structure, virushost<br />
<strong>in</strong>teractions and genome variability of <strong>the</strong> Sulfolobus viruses SIRV1 and<br />
SIRV2. Genetics 152:1387–1396.<br />
23. Prangishvili, D., P. Forterre, and R. A. Garrett. 2006. Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />
a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 4:837–848.<br />
24. Prangishvili, D., and R. A. Garrett. 2005. Viruses of hyper<strong>the</strong>rmophilic<br />
Crenarchaea. Trends Microbiol. 13:535–542.<br />
25. Prangishvili, D., R. A. Garrett, and E. V. Koon<strong>in</strong>. 2006. Evolutionary genomics<br />
of archaeal viruses: unique viral genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life.<br />
Virus Res. 117:52–67.<br />
26. Prangishvili, D., G. Vestergaard, M. Här<strong>in</strong>g, R. Aramayo, T. Basta, R.<br />
Rachel, and R. A. Garrett. 2006. Structural and genomic properties of <strong>the</strong><br />
hyper<strong>the</strong>rmophilic archaeal virus ATV with an extracellular stage of <strong>the</strong><br />
reproductive cycle. J. Mol. Biol. 359:1203–1216.<br />
27. Rachel, R., M. Bettstetter, B. P. Hedlund, M. Här<strong>in</strong>g, A. Kessler, K. O.<br />
Stetter, and D. Prangishvili. 2002. Remarkable morphological diversity of<br />
viruses and virus-like particles <strong>in</strong> hot terrestrial environments. Arch. Virol.<br />
147:2419–2429.<br />
28. Reil<strong>in</strong>, A. 1998. Preparation of catalase crystals. University of Ill<strong>in</strong>ois at Urbana-<br />
Champaign, Urbana, IL. http://www.itg.uiuc.edu/publications/techreports/98-009.<br />
29. Rice, G., K. Stedman, J. Snyder, B. Wiedenheft, D. Willits, S. Brumfield, T.<br />
McDermott, and M. J. Young. 2001. Viruses from extreme <strong>the</strong>rmal environments.<br />
Proc. Natl. Acad. Sci. USA 98:13341–13345.<br />
30. Ru<strong>the</strong>rford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M. A. Rajandream,<br />
and B. Barrell. 2000. ARTEMIS: sequence visualization and annotation.<br />
Bio<strong>in</strong>formatics 16:944–945.<br />
31. Sæbø, P. E., S. M. Andersen, J. Myrseth, J. K. Lærdahl, and T. Rognes. 2005.<br />
PARALIGN: rapid and sensitive sequence similarity searches powered by<br />
parallel comput<strong>in</strong>g technology. Nucleic Acids Res. 33:W535–W539.<br />
32. Snyder, J. C., B. Wiedenheft, M. Lav<strong>in</strong>, F. F. Roberto, J. Spuhler, A. C.<br />
Ortmann, T. Douglas, and M. Young. 2007. Virus movement ma<strong>in</strong>ta<strong>in</strong>s local<br />
virus population diversity. Proc. Natl. Acad. Sci. USA 104:19102–19107.<br />
32a.Ste<strong>in</strong>metz, N. F., A Bize, K. C. F<strong>in</strong>dlay, G. P. Lomonossoff, M. Manchester,<br />
D. J. Evans, and D. Prangishvili. Site-specific and spatially controlled addressability<br />
of a new viral nanobuild<strong>in</strong>g block: Sulfolobus islandicus rodshaped<br />
virus 2. Adv. Funct. Mat., <strong>in</strong> press.<br />
33. Vestergaard, G., R. Aramayo, T. Basta, M. Här<strong>in</strong>g, X. Peng, K. Brügger, L.<br />
Chen, R. Rachel, N. Boisset, R. A. Garrett, and D. Prangishvili. 2008.<br />
Structure of <strong>the</strong> Acidianus filamentous virus 3 and comparative genomics of<br />
related archaeal lipothrixviruses. J. Virol. 82:371–381.<br />
34. Vestergaard, G., M. Här<strong>in</strong>g, X. Peng, R. Rachel, R. A. Garrett, and D.<br />
Prangishvili. 2005. A novel rudivirus, ARV1, of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />
genus Acidianus. Virology 336:83–92.<br />
35. Zillig, W., A. Kletz<strong>in</strong>, C. Schleper, I. Holz, D. Janekovic, H. Ha<strong>in</strong>, M.<br />
Lanzendörfer, and J. K. Kristjansson. 1994. Screen<strong>in</strong>g for Sulfolobales, <strong>the</strong>ir<br />
plasmids and <strong>the</strong>ir viruses <strong>in</strong> Icelandic solfataras. System. Appl. Microbiol.<br />
16:609–628.<br />
Downloaded from<br />
jb.asm.org<br />
by on October 1, 2008
Biochemical Society Transactions www.biochemsoctrans.org<br />
Distribution of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses<br />
and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />
and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism<br />
Molecular Biology of <strong>Archaea</strong> 23<br />
Shiraz Ali Shah*, Niels R. Hansen† and Roger A. Garrett* 1<br />
*Centre for Comparative Genomics, Department of Biology, Biocenter, Copenhagen University, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark, and<br />
†Department of Ma<strong>the</strong>matical Sciences, Copenhagen University, Universitetsparken 5, DK-2100 Copenhagen Ø, Denmark<br />
Abstract<br />
Transcripts from spacer sequences with<strong>in</strong> chromosomal repeat clusters [<strong>CRISPR</strong>s (clusters of regularly<br />
<strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)] from archaea have been implicated <strong>in</strong> <strong>in</strong>hibit<strong>in</strong>g or regulat<strong>in</strong>g <strong>the</strong><br />
propagation of archaeal viruses and plasmids. For <strong>the</strong> crenarchaeal <strong>the</strong>rmoacidophiles, <strong>the</strong> chromosomal<br />
spacers show a high level of matches (∼30%) with viral or plasmid genomes. Moreover, <strong>the</strong>ir distribution<br />
along <strong>the</strong> virus/plasmid genomes, as well as <strong>the</strong>ir DNA strand specificity, appear to be random. This is<br />
consistent with <strong>the</strong> hypo<strong>the</strong>sis that chromosomal spacers are taken up directly and randomly from virus and<br />
plasmid DNA and that <strong>the</strong> spacer transcripts target <strong>the</strong> genomic DNA of <strong>the</strong> extrachromosomal elements<br />
and not <strong>the</strong>ir transcripts.<br />
<strong>Archaea</strong>l <strong>CRISPR</strong> <strong>system</strong><br />
<strong>CRISPR</strong>s (clusters of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)<br />
consist of identical repeats separated by unique spacer<br />
sequences of constant length which occur <strong>in</strong> <strong>the</strong> sequenced<br />
chromosomes of almost all archaea and approx. 40% of<br />
bacteria (reviewed <strong>in</strong> [1]). The archaeal repeat clusters are generally<br />
large and can constitute >1% of <strong>the</strong> chromosome. The<br />
orig<strong>in</strong>al observation that some spacers show close sequence<br />
matches with archaeal viral genomes led to <strong>the</strong> hypo<strong>the</strong>sis<br />
that spacer regions have a regulatory effect on viral propagation<br />
[2] and plasmid propagation [1], and this proposal<br />
was subsequently re<strong>in</strong>forced by several studies on both<br />
archaea and bacteria (reviewed <strong>in</strong> [1,3,4]). Moreover, a<br />
mechanism for this putative <strong>in</strong>hibitory effect was suggested,<br />
at an early stage, by <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that RNA transcripts are<br />
produced, and processed, from at least one strand of <strong>the</strong><br />
archaeal repeat clusters [5,6], with <strong>the</strong> smallest product<br />
correspond<strong>in</strong>g roughly <strong>in</strong> size to a s<strong>in</strong>gle spacer transcript<br />
[1]. This opened for <strong>the</strong> possibility of an antisense RNA<br />
or RNAi (RNA <strong>in</strong>terference)-like mechanism act<strong>in</strong>g ei<strong>the</strong>r<br />
on <strong>the</strong> viral transcripts or directly on <strong>the</strong> viral DNA [1,3].<br />
New spacer-repeat units are added at <strong>the</strong> end of <strong>the</strong> repeat<br />
clusters adjo<strong>in</strong><strong>in</strong>g a low-complexity flank<strong>in</strong>g sequence [1,7],<br />
by a process that probably <strong>in</strong>volves Cas prote<strong>in</strong>s which are<br />
generally encoded adjacent to <strong>the</strong> clusters [3,5,8]. Experimental<br />
evidence for such a virus-<strong>in</strong>duced addition was<br />
Key words: acido<strong>the</strong>rmophile, archaeal plasmid, archaeal virus, cluster of regularly <strong>in</strong>terspaced<br />
pal<strong>in</strong>dromic repeats (<strong>CRISPR</strong>), crenarchaeon.<br />
Abbreviations used: ATV, Acidianus two-tailed virus; <strong>CRISPR</strong>, cluster of regularly <strong>in</strong>terspaced<br />
pal<strong>in</strong>dromic repeats; ITR, <strong>in</strong>verted term<strong>in</strong>al repeat; ORF, open read<strong>in</strong>g frame; SIRV1, Sulfolobus<br />
islandicus rod-shaped virus 1; STIV, Sulfolobus turreted icosahedral virus.<br />
1 To whom correspondence should be addressed (email garrett@bio.ku.dk).<br />
Biochem. Soc. Trans. (2009) 37, 23–28; doi:10.1042/BST0370023<br />
recently provided for bacteria on <strong>in</strong>fect<strong>in</strong>g Streptococcus<br />
<strong>the</strong>rmophilus with bacteriophages 858 and 2972 [9].<br />
Hypo<strong>the</strong>sis<br />
In <strong>the</strong> present article, we explore and <strong>in</strong>terpret trends<br />
which emerge when collectively analys<strong>in</strong>g chromosomal<br />
<strong>CRISPR</strong> spacer matches to viral and plasmid genomes.<br />
The crenarchaeal acido<strong>the</strong>rmophiles were selected for <strong>the</strong><br />
analysis because <strong>the</strong>y carry large and multiple repeat clusters<br />
[1] and because many of <strong>the</strong>ir viruses and plasmids have been<br />
sequenced [10]. The results should yield <strong>in</strong>sights <strong>in</strong>to both <strong>the</strong><br />
mechanism of uptake of new spacer regions <strong>in</strong> <strong>CRISPR</strong>s and<br />
<strong>the</strong> mechanism of <strong>in</strong>hibition or regulation of <strong>the</strong> viruses<br />
and plasmids. We assume that, if chromosomal spacer sequence<br />
matches occur randomly on <strong>the</strong> virus or plasmid<br />
genome, <strong>the</strong>n <strong>the</strong> chromosomal spacer regions are generated<br />
by DNA excision and <strong>in</strong>sertion and not by reverse transcription<br />
from virus/plasmid transcripts. In contrast, a<br />
non-random distribution of matches biased to <strong>the</strong> genes<br />
would favour <strong>the</strong> latter RNA-based mechanism. A random<br />
distribution of spacer matches on <strong>the</strong> virus/plasmid genomes<br />
would also favour a DNA-directed <strong>in</strong>hibitory mechanism<br />
for <strong>the</strong> spacer transcripts, whereas a gene-biased distribution<br />
would support <strong>the</strong> spacer transcripts <strong>in</strong>hibit<strong>in</strong>g virus/plasmid<br />
gene expression.<br />
Previous studies on <strong>the</strong> archaeal <strong>CRISPR</strong>s of related<br />
Sulfolobus solfataricus stra<strong>in</strong>s have suggested that <strong>in</strong>dividual<br />
spacers are quite stable and that any selective pressure acts<br />
on larger blocks of spacers [1], so we <strong>in</strong>fer that any selective<br />
pressures on <strong>CRISPR</strong> spacer contents will not <strong>in</strong>fluence our<br />
results and <strong>in</strong>terpretation significantly.<br />
C○The Authors Journal compilation C○2009 Biochemical Society
24 Biochemical Society Transactions (2009) Volume 37, part 1<br />
Selection of viruses, plasmids and <strong>CRISPR</strong>s<br />
Five crenarchaeal virus families, a class of conjugative<br />
plasmids and a family of cryptic plasmids were selected for <strong>the</strong><br />
study (Table 1). They <strong>in</strong>clude six β-lipothrixviruses, family<br />
Lipothrixviridae;fourrudiviruses,familyRudiviridae;seven<br />
fuselloviruses, family Fuselloviridae; a s<strong>in</strong>gle bicaudavirus<br />
ATV (Acidianus two-tailed virus), family Bicaudaviridae;<br />
STIV (Sulfolobus turreted icosahedral virus), an unclassified<br />
icosahedral virus (reviewed <strong>in</strong> [10]), seven members of a conjugative<br />
plasmid family and four members of <strong>the</strong> pRN cryptic<br />
plasmid family (reviewed <strong>in</strong> [11]). Each extrachromosomal<br />
element can propagate <strong>in</strong> members of <strong>the</strong> related crenarchaeal<br />
<strong>the</strong>rmoacidophilic genera Sulfolobus or Acidianus. Spacer<br />
sequences were derived from 13 whole crenarchaeal chromosomal<br />
sequences, from both acido<strong>the</strong>rmophiles and neutro<strong>the</strong>rmophiles,<br />
and <strong>the</strong> partial genomes of Acidianus brierleyi,<br />
S. solfataricus P1 and Sulfolobus islandicus HVE10/4 from our<br />
laboratory and of S. islandicus stra<strong>in</strong>s LD85, YG5714,<br />
YN1551, M164 and U328 which were publicly available <strong>in</strong><br />
May 2008 (Table 1).<br />
Identify<strong>in</strong>g spacer matches<br />
<strong>CRISPR</strong> regions were localized us<strong>in</strong>g publicly available<br />
software [12,13] and exam<strong>in</strong>ed for <strong>the</strong> occurrence of spacer<br />
sequence matches to <strong>the</strong> selected viruses and plasmids. Two<br />
approaches were employed. In one, matches were identified<br />
at a nucleotide sequence level between <strong>the</strong> similarly oriented<br />
spacer sequences (correspond<strong>in</strong>g to <strong>the</strong> processed transcript<br />
sequence [1,5,6]) and ei<strong>the</strong>r strand of <strong>the</strong> virus/plasmid DNA.<br />
In a second approach, we exploited <strong>the</strong> observation that<br />
prote<strong>in</strong> sequences are more highly conserved than gene<br />
sequences and tried to detect significant matches additional<br />
to those identified at a nucleotide sequence level. Each spacer<br />
strand was translated <strong>in</strong>to three am<strong>in</strong>o acid sequences, and,<br />
after remov<strong>in</strong>g sequences conta<strong>in</strong><strong>in</strong>g stop codons (about<br />
50%), each translated sequence was aligned aga<strong>in</strong>st am<strong>in</strong>o<br />
acid sequences of all annotated ORFs (open read<strong>in</strong>g frames)<br />
of all <strong>the</strong> viruses and plasmids. Implicit <strong>in</strong> this approach is<br />
<strong>the</strong> assumption that <strong>the</strong> uptake of spacers <strong>in</strong> <strong>the</strong> oriented<br />
<strong>CRISPR</strong>s is non-directional, and this is borne out by <strong>the</strong><br />
results (see below). A nucleotide sequence approach was<br />
also applied to <strong>the</strong> whole acido<strong>the</strong>rmophile chromosomes<br />
by search<strong>in</strong>g for exact matches to <strong>CRISPR</strong> spacers (Table 1).<br />
Significant e-value cut-offs were determ<strong>in</strong>ed for both <strong>the</strong> nucleotide<br />
and am<strong>in</strong>o acid sequence searches us<strong>in</strong>g <strong>the</strong> genome<br />
sequence of Saccharomyces cerevisiae as a negative control<br />
(results not shown). All sequence alignments were performed<br />
us<strong>in</strong>g Paralign, an MMX-optimized implementation of <strong>the</strong><br />
Smith–Watermann algorithm [14].<br />
Analysis of <strong>the</strong> distribution of<br />
chromosomal spacer matches on<br />
virus/plasmid genomes<br />
In total, 82 repeat clusters, some <strong>in</strong>complete (Table 1), yielded<br />
4005 spacer sequences, after subtract<strong>in</strong>g 278 spacer sequences<br />
C○The Authors Journal compilation C○2009 Biochemical Society<br />
shared between S. solfataricus stra<strong>in</strong>s P1 and P2 [1]. Approx.<br />
30% of <strong>the</strong> spacers from <strong>the</strong> acido<strong>the</strong>rmophile genomes<br />
match to <strong>the</strong> virus and plasmid families (Table 1), whereas<br />
only approx. 5% matched for <strong>the</strong> neutro<strong>the</strong>rmophiles.<br />
This difference probably reflects that <strong>the</strong> viruses and plasmids<br />
only fall with<strong>in</strong> <strong>the</strong> host specificity range for <strong>the</strong> acido<strong>the</strong>rmophiles.<br />
The locations of all <strong>the</strong> spacer matches are<br />
superimposed on genome maps of representative genetic elements<br />
<strong>in</strong> Figure 1. Spacers giv<strong>in</strong>g nucleotide sequence matches<br />
to ei<strong>the</strong>r DNA strand (red l<strong>in</strong>es) occur ma<strong>in</strong>ly with<strong>in</strong> genes,<br />
but a few are located <strong>in</strong>tergenically or with<strong>in</strong> <strong>the</strong> non-prote<strong>in</strong>cod<strong>in</strong>g<br />
region of <strong>the</strong> ITR (<strong>in</strong>verted term<strong>in</strong>al repeat).<br />
Translated spacers yield<strong>in</strong>g am<strong>in</strong>o acid sequence matches,<br />
additionally to <strong>the</strong> nucleotide sequence matches, occur<br />
with<strong>in</strong> annotated ORFs on ei<strong>the</strong>r DNA strand (green l<strong>in</strong>es).<br />
In a series of three tests, we attempted to address <strong>the</strong><br />
question of whe<strong>the</strong>r or not <strong>the</strong> spacers present <strong>in</strong> host<br />
chromosomal <strong>CRISPR</strong>s match <strong>the</strong> virus/plasmid genomes<br />
<strong>in</strong> a biased non-random manner. Potential biases <strong>in</strong>clude <strong>the</strong><br />
preferential match<strong>in</strong>g to certa<strong>in</strong> regions of <strong>the</strong> virus/plasmid<br />
genome and DNA strand biases. We exclusively used <strong>the</strong> nucleotide<br />
sequence match<strong>in</strong>g data because it covered <strong>the</strong> whole<br />
genome.<br />
First, we exam<strong>in</strong>ed <strong>the</strong> distribution of spacer sequence<br />
matches, at a nucleotide level, along <strong>the</strong> virus/plasmid<br />
genomes. We assumed that a uniform distribution would<br />
follow, roughly, a homogeneous Poisson process, whereas<br />
an irregular distribution along <strong>the</strong> genome would yield a<br />
deviation from <strong>the</strong> homogeneous Poisson process. We <strong>in</strong>vestigated<br />
for this us<strong>in</strong>g Kolmogorov–Smirnov test statistics for<br />
each virus and plasmid and we were generally unable to detect<br />
any significant deviations from a homogeneous Poisson<br />
distribution.<br />
Secondly, we tested whe<strong>the</strong>r <strong>the</strong>re was any detectable<br />
bias <strong>in</strong> <strong>the</strong> spacer matches to <strong>the</strong> most conserved viral genes<br />
given that <strong>the</strong>y are more likely to be targets for <strong>in</strong>hibition<br />
of propagation. The number of matches to each gene was<br />
analysed us<strong>in</strong>g a Poisson regression model with <strong>the</strong> gene conservation<br />
and length as explanatory variables. This analysis<br />
showed that <strong>the</strong> number of matches to a given gene did not<br />
depend significantly upon <strong>the</strong> degree of its conservation,<br />
although, for SIRV1 (Sulfolobus islandicus rod-shaped<br />
virus 1), we did observe a weak effect for <strong>the</strong> seven to ten<br />
most conserved genes. Moreover, it was found that <strong>the</strong><br />
expected number of matches was proportional to <strong>the</strong> gene<br />
length, <strong>in</strong> agreement with <strong>the</strong> homogeneous Poisson process.<br />
Thirdly, we tested for any bias <strong>in</strong> <strong>the</strong> distribution of<br />
spacer matches <strong>in</strong> cod<strong>in</strong>g compared with non-cod<strong>in</strong>g regions<br />
or to <strong>the</strong> sense compared with antisense strands of <strong>the</strong> virus/<br />
plasmid genes us<strong>in</strong>g a specific alternative of a Poisson process<br />
with different <strong>in</strong>tensities for matches occurr<strong>in</strong>g with<strong>in</strong>,<br />
and outside, prote<strong>in</strong>-cod<strong>in</strong>g regions, treat<strong>in</strong>g each DNA<br />
strand separately. We were unable to detect any significant<br />
deviations from a homogeneous Poisson distribution for <strong>the</strong><br />
match <strong>in</strong>tensities of <strong>the</strong> cod<strong>in</strong>g compared with non-cod<strong>in</strong>g<br />
regions, with <strong>the</strong> exception of STIV, where <strong>the</strong>re is a bias to<br />
<strong>the</strong> antisense strand (Figure 1).
C○The Authors Journal compilation C○2009 Biochemical Society<br />
Table 1 Summary of <strong>the</strong> chromosomal spacer matches to <strong>the</strong> virus and plasmid genomes of <strong>the</strong> crenarchaeal acido<strong>the</strong>rmophiles<br />
The number of <strong>CRISPR</strong> spacers are given which match virus/plasmid family genomes significantly at a nucleotide level, as well as additional matches detected at an am<strong>in</strong>o acid level. Spacer matches to <strong>the</strong><br />
host’s own genome constitute only exact nucleotide matches. The total number of chromosomal spacers match<strong>in</strong>g to virus/plasmid genomes differs from <strong>the</strong> number of spacers that match each plasmid and<br />
virus family because some spacers match more than one family, but have been counted only once. Rudiviruses comprise SIRV1, SIRV2, ARV and SRV1; β-lipothrixviruses constitute AFV3, AFV6, AFV7, AFV8, AFV9<br />
and SIFV, and fuselloviruses <strong>in</strong>clude SSV2, SSV4, SSV5, SSVrh, SSVk1 and SSV1. The pNOB8 family conta<strong>in</strong>s pNOB8, pARN3, pARN4, pHVE14, pING1, pKEF9, pSOG1 and pSOG2, and <strong>the</strong> pRN family consists of pHEN7,<br />
pDL10, pRN1 and pRN2. The 278 spacers which S. solfataricus P1 shares with stra<strong>in</strong> P2 were subtracted dur<strong>in</strong>g <strong>the</strong> analysis, but have been re<strong>in</strong>serted <strong>in</strong> this Table. For <strong>the</strong> partial genomes, <strong>the</strong> total numbers of<br />
spacers are approximate, s<strong>in</strong>ce repeat clusters may not be fully sequenced. Genome sequences for S. solfataricus P1, S. islandicus HVE10/4 and A. brierleyi are unpublished work from our laboratory. Genomes<br />
of S. islandicus stra<strong>in</strong>s LD85, YG5714, YN1551, M164 and U328 are publicly available from <strong>the</strong> JGI (Jo<strong>in</strong>t Genome Institute) database (http://www.jgi.doe.gov/). All neutro<strong>the</strong>rmophile genomes were complete<br />
and obta<strong>in</strong>ed through GenBank ® accession numbers NC_000854 (Aeropyrum pernix K1), NC_008818 (Hyper<strong>the</strong>rmus butylicus DSM5456), NC_009776 (Ignicoccus hospitalis KIN4/I), NC_003364 (Pyrobaculum<br />
aerophilum IM2), NC_009376 (Pyrobaculum arsenaticum DSM 13514), NC_009073 (Pyrobaculum calidifontis JCM11548), NC_008701 (Pyrobaculum islandicum DSM4184), NC_009033 (Staphylo<strong>the</strong>rmus mar<strong>in</strong>us<br />
F1) and NC_008698 (Thermofilum pendens Hrk5).<br />
Spacers pNOB8 family pRN family Spacers Matches with GenBank ® /JGI accession<br />
Stra<strong>in</strong> (total) Rudiviruses β-Lipothrixviruses Fuselloviruses STIV ATV (conjugative) (cryptic) (total match<strong>in</strong>g) own genome number/reference<br />
Acido<strong>the</strong>rmophiles (total) 3313 331 181 134 81 126 226 63 969 1 –<br />
Sulfolobus solfataricus P2 415 53 24 15 9 20 26 12 135 0 NC_002754<br />
Sulfolobus solfataricus P1 423 50 22 19 9 26 32 7 144 0 [1]<br />
Sulfolobus islandicus HVE10/4 270 47 20 20 4 3 19 9 104 0 Unpublished<br />
Sufolobus tokodaii 7 461 23 19 19 13 2 43 6 108 1 NC_003106<br />
Sulfolobus acidocaldarius DSM639 223 14 5 2 1 2 15 4 38 0 NC_007181<br />
Metallosphaera sedula DSM5348 386 20 9 8 6 59 31 4 110 0 NC_009440<br />
Acidianus brierleyi 367 29 21 9 8 5 32 10 100 0 Unpublished<br />
Sulfolobus islandicus LD85 287 65 39 10 6 1 6 6 114 0 4023472<br />
Four Sulfolobus islandicus stra<strong>in</strong>s<br />
(YG5714, YN1551, M164, U328)<br />
481 30 22 32 25 8 19 5 116 0 4023468, 4005359,<br />
Neutro<strong>the</strong>rmophiles (total) 963 6 13 14 1 4 16 0 52 0 –<br />
4023464, 4023466<br />
Molecular Biology of <strong>Archaea</strong> 25
26 Biochemical Society Transactions (2009) Volume 37, part 1<br />
Figure 1 <strong>CRISPR</strong> spacer matches superimposed on genomes of representative viruses and plasmids<br />
SIRV1, rudiviruses; AFV9 (Acidianus filamentous virus 9), β-lipothrixviruses; SSV2 (Sulfolobus sp<strong>in</strong>dle-shaped virus 2),<br />
fuselloviruses; STIV, unclassified icosahedral virus; ATV, bicaudavirus; pNOB8, conjugative plasmids; pHEN7, cryptic plasmids.<br />
A prelim<strong>in</strong>ary version of <strong>the</strong> rudiviral data was presented <strong>in</strong> [15]. The circular genomes (SSV2, STIV, ATV, pNOB8 and pHEN7)<br />
are presented <strong>in</strong> a l<strong>in</strong>ear format. Prote<strong>in</strong>-cod<strong>in</strong>g regions are boxed and shaded, accord<strong>in</strong>g to <strong>the</strong>ir levels of conservation<br />
for those genomes for which comparative data are available (all except for STIV and ATV). Spacer sequence matches are<br />
<strong>in</strong>dicated by l<strong>in</strong>es above and below <strong>the</strong> genomes for <strong>the</strong> two DNA strands and <strong>the</strong>y are colour-coded accord<strong>in</strong>g to whe<strong>the</strong>r<br />
<strong>the</strong>y occur exclusively at a nucleotide level (red) or additionally at an am<strong>in</strong>o acid level (green).<br />
Similar results for <strong>the</strong> first and third tests were obta<strong>in</strong>ed<br />
when <strong>the</strong> analysis was limited to spacer matches from family<br />
I <strong>CRISPR</strong>s (see below).<br />
Classify<strong>in</strong>g crenarchaeal acido<strong>the</strong>rmophile<br />
<strong>CRISPR</strong> families<br />
<strong>CRISPR</strong>s are oriented and <strong>the</strong>y generally carry a 300–600 bp<br />
low-complexity flank<strong>in</strong>g sequence immediately upstream of<br />
<strong>the</strong> repeat cluster which conta<strong>in</strong>s <strong>the</strong> transcriptional leader<br />
sequence [1]. Sequence analysis of <strong>the</strong> flank<strong>in</strong>g sequences<br />
by multiple alignment [16] and motif analysis [17], along<br />
with sequence comparison of <strong>the</strong> repeat sequence from each<br />
C○The Authors Journal compilation C○2009 Biochemical Society<br />
cluster, suggested that <strong>the</strong> <strong>CRISPR</strong>s can be classified <strong>in</strong>to<br />
families. All crenarchaeal flank<strong>in</strong>g sequences share a common<br />
A/T-rich motif adjacent to <strong>the</strong> first repeat of <strong>the</strong> cluster,<br />
whereas <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> flank<strong>in</strong>g sequence is familyspecific.<br />
At least three dist<strong>in</strong>ct families, each with multiple<br />
members, were found for <strong>the</strong> acido<strong>the</strong>rmophiles by analys<strong>in</strong>g<br />
<strong>the</strong> flank<strong>in</strong>g sequences alone (Figure 2A), and this f<strong>in</strong>d<strong>in</strong>g<br />
was re<strong>in</strong>forced by construct<strong>in</strong>g a multiple alignment of repeat<br />
sequences from <strong>the</strong> clusters (Figure 2B). Thus <strong>the</strong>re is a<br />
clear correlation between <strong>the</strong> nature of <strong>the</strong> flank<strong>in</strong>g sequence<br />
and <strong>the</strong> repeat sequence which constitutes a repeat cluster.<br />
These <strong>CRISPR</strong> families cross species and genus barriers, and<br />
most of <strong>the</strong> acido<strong>the</strong>rmophile genomes conta<strong>in</strong> clusters from
Figure 2 <strong>CRISPR</strong> families of crenarchaeal acido<strong>the</strong>rmophiles<br />
(A) Schematicrepresentationof<strong>the</strong>threetypesofflank<strong>in</strong>gsequence<br />
associated with <strong>CRISPR</strong> families I, II and III. All three flank<strong>in</strong>g se-<br />
quences share a motif adjacent to <strong>the</strong> repeat cluster, whereas<br />
<strong>the</strong> upstream region of <strong>the</strong> flank is specific for each family. (B)<br />
Phylogenetic tree created us<strong>in</strong>g ClustalW [18] based on a multiple<br />
alignment of a repeats from each acido<strong>the</strong>rmophile repeat cluster.<br />
The <strong>CRISPR</strong>s studied are labelled by a four-letter prefix based<br />
on <strong>the</strong> genus and species name <strong>in</strong> addition to <strong>the</strong> number of<br />
repeats carried by <strong>the</strong> repeat cluster. Abri, Acidianus brierleyi; Msed,<br />
Metallosphaera sedula; Saci,Sulfolobus acidocaldarius; Sisl, Sulfolobus<br />
islandicus; Ssol, Sulfolobus solfataricus; Stok, Sufolobus tokodaii.<br />
S. islandicus HVE10/4 and A. brierleyi repeat clusters were not<br />
Molecular Biology of <strong>Archaea</strong> 27<br />
completely sequenced and <strong>the</strong> total number of repeats is not given.<br />
The three major repeat cluster families are <strong>in</strong>dicated by differently<br />
shaded boxes. (C) Logo-plot (http://weblogo.berkeley.edu/) of <strong>the</strong><br />
motif located upstream of <strong>the</strong> area on a virus or plasmid genome<br />
matched by a group I spacer. The CC motif was found at approx. 75% of<br />
all match<strong>in</strong>g sites.<br />
different families. Therefore no families are specific to a given<br />
species and no species is limited to a s<strong>in</strong>gle family. These<br />
results strongly re<strong>in</strong>force <strong>the</strong> hypo<strong>the</strong>sis that <strong>CRISPR</strong>–Cas<br />
<strong>system</strong>s are acquired via horizontal gene transfer [1,19].<br />
Over half of <strong>the</strong> acido<strong>the</strong>rmophile repeat clusters belong<br />
to family I, where, generally, <strong>the</strong> sequence just upstream of<br />
<strong>the</strong> virus or plasmid site which matches a family I spacer<br />
carries a CC motif (Figure 2C). Insufficient data precluded<br />
our establish<strong>in</strong>g whe<strong>the</strong>r such motifs occur adjacent to<br />
family II and family III spacer matches.<br />
Conclusions<br />
The results demonstrate that <strong>CRISPR</strong> spacer matches are<br />
uniformly distributed throughout <strong>the</strong> virus/plasmid genomes,<br />
regardless of both gene location and degree of gene conservation.<br />
Moreover, <strong>the</strong>re is no significant bias to ei<strong>the</strong>r sense<br />
or antisense strands of genes (with <strong>the</strong> exception of STIV):<br />
both strands are targeted to an equal degree. These f<strong>in</strong>d<strong>in</strong>gs<br />
strongly suggest that <strong>the</strong> spacer regions of <strong>the</strong> <strong>CRISPR</strong><br />
are taken up randomly, and non-directionally, from <strong>the</strong><br />
virus or plasmid DNA and are not generated by reverse<br />
transcriptase from virus/plasmid transcripts. The results are<br />
also consistent with <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> <strong>CRISPR</strong> spacer<br />
transcripts target <strong>the</strong> virus/plasmid by hybridiz<strong>in</strong>g directly<br />
to <strong>the</strong>ir DNA, possibly prim<strong>in</strong>g it for degradation.<br />
The results also support a mechanism whereby virus or<br />
plasmid propagation is <strong>in</strong>hibited primarily at a DNA level and<br />
not at a gene-expression level. For example, <strong>the</strong> non-prote<strong>in</strong>cod<strong>in</strong>g<br />
ITR region, which is implicated <strong>in</strong> rudiviral replication<br />
[10], carries seven spacer matches <strong>in</strong> SIRV1 (Figure 1)<br />
and o<strong>the</strong>r spacer matches occur <strong>in</strong> <strong>in</strong>tergenic regions which<br />
appear not to be <strong>in</strong>volved <strong>in</strong> transcriptional regulation<br />
(results not shown).<br />
The <strong>in</strong>hibitory mechanism also appears to be highly<br />
specific for virus/plasmid DNA, s<strong>in</strong>ce only one perfect<br />
spacer sequence match was detected with<strong>in</strong> any of <strong>the</strong> acido<strong>the</strong>rmophile<br />
chromosomal sequences exam<strong>in</strong>ed (Table 1).<br />
This may be crucial for cell survival if <strong>the</strong> <strong>in</strong>hibitory<br />
mechanism <strong>in</strong>volves DNA degradation, but, given that<br />
viruses and plasmids often <strong>in</strong>tegrate reversibly <strong>in</strong>to archaeal<br />
chromosomes [20], it suggests that <strong>the</strong> <strong>CRISPR</strong>–Cas <strong>system</strong><br />
selectively targets DNA of extrachromosomal elements,<br />
whe<strong>the</strong>r circular or l<strong>in</strong>ear.<br />
The <strong>CRISPR</strong>–Cas <strong>system</strong> has been primarily implicated<br />
<strong>in</strong> viral <strong>in</strong>hibition <strong>in</strong> both archaea and bacteria [1,3,4], but it<br />
is clear from <strong>the</strong> present analysis that, at least for archaea, its<br />
role is more complex. The apparatus targets plasmids, both<br />
conjugative and cryptic, with a similar frequency to viruses<br />
(Figure 1). Moreover, some host <strong>CRISPR</strong> spacers match <strong>the</strong>ir<br />
C○The Authors Journal compilation C○2009 Biochemical Society
28 Biochemical Society Transactions (2009) Volume 37, part 1<br />
own viruses or plasmids, suggest<strong>in</strong>g a regulatory, ra<strong>the</strong>r than<br />
an <strong>in</strong>hibitory, role, and this possibility is re<strong>in</strong>forced by <strong>the</strong> low<br />
copy numbers, and non-lytic properties, of most crenarchaeal<br />
viruses [10]. F<strong>in</strong>ally, <strong>the</strong> observation that a spacer sequence<br />
<strong>in</strong> <strong>the</strong> repeat cluster of <strong>the</strong> conjugative plasmid pKEF9<br />
[21] matches a rudiviral genome suggests that plasmids<br />
<strong>the</strong>mselves can also <strong>in</strong>hibit/regulate co-<strong>in</strong>fect<strong>in</strong>g viruses.<br />
Acknowledgements<br />
Dr Kim Brügger k<strong>in</strong>dly provided unpublished genome sequence data.<br />
Fund<strong>in</strong>g<br />
Work was supported by <strong>the</strong> Danish National Research Foundation for<br />
a Centre of Comparative Genomics and <strong>the</strong> Danish Natural Science<br />
Research Council [grant number 272-06-0442].<br />
References<br />
1 Lillestøl, R.K., Redder, P., Garrett, R.A. and Brügger, K. (2006) A putative<br />
viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59–72<br />
2 Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J. and Soria, E. (2005)<br />
Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats derive<br />
from foreign genetic elements. J. Mol. Evol. 60, 174–182<br />
3 Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I. and Koon<strong>in</strong>, E.V.<br />
(2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong><br />
prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />
mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical<br />
mechanisms of action. Biol. Direct 1, 7<br />
4Sorek,R.,Kun<strong>in</strong>,V.andHugenholtz,P.(2008)<strong>CRISPR</strong>:awidespread<br />
<strong>system</strong> that provides acquired resistance aga<strong>in</strong>st phages <strong>in</strong> bacteria and<br />
archaea. Nat. Rev. Microbiol. 6, 181–186<br />
5Tang,T.-H.,Bachellerie,J.-P.,Rozhdestvensky,T.,Bortol<strong>in</strong>,M.-L.,<br />
Huber, H., Drungowski, M., Elge, T., Brosius, J. and Hüttenhofer, A. (2002)<br />
Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong><br />
archaeon Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99,<br />
7536–7541<br />
6Tang,T.-H.,Polacek,N.,Zywicki,M.,Huber,H.,Brügger, K., Garrett, R.A.,<br />
Bachellerie, J. P. and Hüttenhofer, A. (2005) Identification of novel<br />
non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon<br />
Sulfolobus solfataricus. Mol. Microbiol. 55, 469–481<br />
7 Pourcel, C., Salvignol, G. and Vergnaud, G. (2005) <strong>CRISPR</strong> elements <strong>in</strong><br />
Yers<strong>in</strong>ia pestis acquire new repeats by preferential uptake of<br />
bacteriophage DNA, and provide additional tools for evolutionary<br />
studies. Microbiology 151, 653–663<br />
C○The Authors Journal compilation C○2009 Biochemical Society<br />
8 Jansen, R., Embden, J.D., Gaastra, W. and Schouls, L.M. (2002)<br />
Identification of genes that are associated with DNA repeats <strong>in</strong><br />
prokaryotes. Mol. Microbiol. 43, 1565–1575<br />
9 Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />
Mo<strong>in</strong>eau, S., Romero, D.A. and Horvath, P. (2007) <strong>CRISPR</strong> provides<br />
acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315,<br />
1709–1712<br />
10 Prangishvili, D., Forterre, P. and Garrett, R.A. (2006) Viruses of <strong>the</strong><br />
<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837–848<br />
11 Lipps, G. (2006) Plasmids and viruses of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />
crenarchaeote Sulfolobus. Extremophiles 10, 17–28<br />
12 Edgar, R.C. (2007) PILER-CR: fast and accurate identification of <strong>CRISPR</strong><br />
repeats. BMC Bio<strong>in</strong>formatics 8, 18–24<br />
13 Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C.<br />
and Hugenholtz, P. (2007) <strong>CRISPR</strong> Recognition Tool (CRT): a tool for<br />
automatic detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic<br />
repeats. BMC Bio<strong>in</strong>formatics 8, 209–217<br />
14 Saebø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K. and Rognes, T.<br />
(2005) PARALIGN: rapid and sensitive sequence similarity searches<br />
powered by parallel comput<strong>in</strong>g technology. Nucleic Acids Res. 33,<br />
535–539<br />
15 Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />
Briegel, A., Rachel, R., Garrett, R.A. and Prangishvili, D. (2008) SRV, a<br />
new rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />
rudiviruses with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />
6837–6845<br />
16 Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high<br />
accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797<br />
17 Bailey, T.L., Williams, N., Misleh, C. and Li, W.W. (2006) MEME:<br />
discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong> sequence motifs.<br />
Nucleic Acids Res. 34, 369–373<br />
18 Thompson, J.D., Higg<strong>in</strong>s, D.G. and Gibson, T.J. (1994) CLUSTAL W:<br />
improv<strong>in</strong>g <strong>the</strong> sensitivity of progressive multiple sequence alignment<br />
through sequence weight<strong>in</strong>g, position-specific gap penalties and weight<br />
matrix choice. Nucleic Acids Res. 22, 4673–4680<br />
19 Godde, J.S. and Bickerton, A. (2006) The repetitive DNA elements called<br />
<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />
among prokaryotes. J. Mol. Evol. 62, 718–729<br />
20 Wang, Y., Duan, Z., Zhu, H., Guo, X., Wang, Z., Zhou, J., She, Q. and<br />
Huang, L. (2007) A novel Sulfolobus non-conjugative extrachromosomal<br />
genetic element capable of <strong>in</strong>tegration <strong>in</strong>to <strong>the</strong> host genome and<br />
spread<strong>in</strong>g <strong>in</strong> <strong>the</strong> presence of a fusellovirus. Virology 363,<br />
124–133<br />
21 Greve, B., Jensen, S., Brügger, K., Zillig, W. and Garrett, R.A. (2004)<br />
Genomic comparison of archaeal conjugative plasmids from Sulfolobus.<br />
<strong>Archaea</strong> 1, 231–239<br />
Received 6 August 2008<br />
doi:10.1042/BST0370023
Molecular Microbiology (2009) 72(1), 259–272 doi:10.1111/j.1365-2958.2009.06641.x<br />
First published onl<strong>in</strong>e 2 March 2009<br />
<strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus Sulfolobus:<br />
bidirectional transcription and dynamic properties<br />
Reidun K. Lillestøl, Shiraz A. Shah, Kim Brügger, †<br />
Peter Redder, Hien Phan, Jan Christiansen and<br />
Roger A. Garrett*<br />
Centre for Comparative Genomics, Department of<br />
Biology, University of Copenhagen, Ole Maaløes Vej 5,<br />
2200 Copenhagen N, Denmark.<br />
Summary<br />
Clusters of regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />
repeats (<strong>CRISPR</strong>s) of Sulfolobus fall <strong>in</strong>to three ma<strong>in</strong><br />
families based on <strong>the</strong>ir repeats, leader regions, associated<br />
cas genes and putative recognition sequences<br />
on viruses and plasmids. Spacer sequence matches<br />
to different viruses and plasmids of <strong>the</strong> Sulfolobales<br />
revealed some bias particularly for family III <strong>CRISPR</strong>s.<br />
Transcription occurs on both strands of <strong>the</strong> five<br />
repeat-clusters of Sulfolobus acidocaldarius and a<br />
repeat-cluster of <strong>the</strong> conjugative plasmid pKEF9.<br />
Leader strand transcripts cover whole repeat-clusters<br />
and are processed ma<strong>in</strong>ly from <strong>the</strong> 3-end, with<strong>in</strong><br />
repeats, yield<strong>in</strong>g heterogeneous 40–45 nt spacer<br />
RNAs. Process<strong>in</strong>g of <strong>the</strong> pKEF9 leader transcript<br />
occurred partially <strong>in</strong> spacers, and was <strong>in</strong>complete,<br />
probably reflect<strong>in</strong>g defective repeat recognition by<br />
host enzymes. A similar level of transcripts was generated<br />
from complementary strands of each chromosomal<br />
repeat-cluster and <strong>the</strong>y were processed to<br />
yield discrete ~55 nt spacer RNAs. Analysis of <strong>the</strong><br />
partially identical repeat-clusters of Sulfolobus solfataricus<br />
stra<strong>in</strong>s P1 and P2 revealed that spacer-repeat<br />
units are added upstream only when a leader and<br />
certa<strong>in</strong> cas genes are l<strong>in</strong>ked. Downstream ends of <strong>the</strong><br />
repeat-clusters are conserved such that deletions and<br />
recomb<strong>in</strong>ation events occur <strong>in</strong>ternally.<br />
Introduction<br />
Clusters of regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />
repeats (<strong>CRISPR</strong>s) consist of identical repeats separated<br />
by unique spacer sequences of constant length. They are<br />
Accepted 14 February, 2009. *For correspondence. E-mail garrett@<br />
bio.ku.dk; Tel. (+45) 35322010; Fax (+45) 35322228. † Present<br />
address: Wellcome Trust Sanger Institute, H<strong>in</strong>xton, UK.<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd<br />
present <strong>in</strong> <strong>the</strong> sequenced chromosomes of almost all<br />
archaea and about 40% of bacteria, as well as <strong>in</strong> some<br />
plasmids (Lillestøl et al., 2006; Grissa et al., 2007; Sorek<br />
et al., 2008). The orig<strong>in</strong>al observation that some spacers<br />
show close sequence matches to viral genomes and plasmids<br />
(Mojica et al., 2005) led to <strong>the</strong> hypo<strong>the</strong>sis that<br />
spacer regions are <strong>in</strong>corporated <strong>in</strong>to <strong>the</strong> chromosome<br />
from <strong>the</strong> extra-chromosomal element and have a regulatory<br />
or <strong>in</strong>hibitory effect on <strong>the</strong>ir propagation (Bolot<strong>in</strong> et al.,<br />
2005; Mojica et al., 2005; Pourcel et al., 2005; Lillestøl<br />
et al., 2006). Recently, this hypo<strong>the</strong>sis was re<strong>in</strong>forced<br />
experimentally for bacteria by show<strong>in</strong>g that new spacers<br />
deriv<strong>in</strong>g from phage genomes <strong>in</strong>tegrate <strong>in</strong>to <strong>CRISPR</strong>s of<br />
Streptococcus <strong>the</strong>rmophilus <strong>in</strong> response to phage <strong>in</strong>fection,<br />
which <strong>in</strong> turn leads to phage resistance (Barrangou<br />
et al., 2007; Deveau et al., 2008; Horvath et al., 2008a). In<br />
both archaea and bacteria, new spacer-repeat units are<br />
added at <strong>the</strong> end of <strong>the</strong> repeat-clusters adjo<strong>in</strong><strong>in</strong>g a low<br />
complexity leader sequence (Jansen et al., 2002; Tang<br />
et al., 2002; Pourcel et al., 2005; Lillestøl et al., 2006;<br />
Barrangou et al., 2007), presumably facilitated by Cas<br />
prote<strong>in</strong>s which are generally encoded adjacent to <strong>the</strong><br />
clusters (Jansen et al., 2002; Haft et al., 2005; Makarova<br />
et al., 2006).<br />
Despite <strong>the</strong> akaryotic nature of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong><br />
(Forterre, 1992), <strong>the</strong>re are significant differences between<br />
<strong>the</strong> archaeal and bacterial <strong>system</strong>s studied so far. First,<br />
archaeal repeat-clusters tend to be very extensive and<br />
can constitute more than 1% of <strong>the</strong> chromosome (Lillestøl<br />
et al., 2006). Second, <strong>the</strong>y often exhibit a low level of , or<br />
no, dyad symmetry <strong>in</strong> <strong>the</strong>ir repeat sequences (Lillestøl<br />
et al., 2006; Kun<strong>in</strong> et al., 2007). Third, some of <strong>the</strong> cas<br />
genes implicated <strong>in</strong> RNA process<strong>in</strong>g and spacer<br />
sequence <strong>in</strong>sertion are highly divergent between archaea<br />
and bacteria (Haft et al., 2005). Fourth, <strong>the</strong> many archaeal<br />
spacer sequences which match plasmids or viruses show<br />
no clear bias to viruses (Shah et al., 2009).<br />
A mechanism for <strong>the</strong> putative regulatory or <strong>in</strong>hibitory<br />
effect <strong>in</strong> both euryarchaea and crenarchaea was suggested,<br />
at an early stage, by <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that RNA transcripts<br />
are produced, and processed, from one strand of<br />
archaeal repeat-clusters (Tang et al., 2002; 2005), with<br />
<strong>the</strong> smallest product correspond<strong>in</strong>g approximately <strong>in</strong><br />
both size and sequence to a s<strong>in</strong>gle spacer transcript<br />
(Lillestøl et al., 2006). Fur<strong>the</strong>rmore, it was demonstrated
260 R. K. Lillestøl et al. <br />
experimentally for a bacterium that a complex of Cas<br />
prote<strong>in</strong>s was responsible for process<strong>in</strong>g <strong>in</strong> <strong>the</strong> repeats to<br />
generate <strong>the</strong> small RNAs encompass<strong>in</strong>g <strong>the</strong> spacer<br />
regions (Brouns et al., 2008) and for <strong>the</strong> euryarchaeon<br />
Pyrococcus furiosus, it was shown that <strong>the</strong> Cas6 prote<strong>in</strong><br />
b<strong>in</strong>ds to <strong>the</strong> 5′-end of <strong>the</strong> repeat transcript and cuts, by<br />
a putative ruler mechanism, with<strong>in</strong> <strong>the</strong> 3′-end (Carte<br />
et al., 2008). In <strong>the</strong> crenarchaeon Sulfolobus acidocaldarius,<br />
evidence was also presented for transcription<br />
occurr<strong>in</strong>g from <strong>the</strong> complementary strand of <strong>the</strong> DNA<br />
spacer (Lillestøl et al., 2006). These results opened for<br />
<strong>the</strong> possibility of an antisense RNA or RNAi-like mechanism<br />
act<strong>in</strong>g ei<strong>the</strong>r on <strong>the</strong> viral/plasmid transcripts or<br />
directly on <strong>the</strong>ir DNA (Lillestøl et al., 2006; Makarova<br />
et al., 2006). Recent studies on <strong>the</strong> P. furiosus have<br />
shown that <strong>the</strong> leader strand spacer RNAs can generate<br />
dist<strong>in</strong>ct RNA–prote<strong>in</strong> complexes (Hale et al., 2008).<br />
Moreover, bio<strong>in</strong>formatical studies on crenarchaeal<br />
<strong>CRISPR</strong>s (Shah et al., 2009), as well as experimental<br />
studies on bacteria (Brouns et al., 2008; Marraff<strong>in</strong>i and<br />
Son<strong>the</strong>imer, 2008), support that spacer RNAs directly<br />
target DNA of extra-chromosomal elements, ra<strong>the</strong>r than<br />
<strong>the</strong>ir mRNAs.<br />
Here, we characterize <strong>CRISPR</strong> families of <strong>the</strong> model<br />
crenarchaeal genus Sulfolobus, and related members of<br />
<strong>the</strong> Sulfolobales, for which several genomes and numerous<br />
viruses and plasmids have been sequenced (Prangishvili<br />
et al., 2006; Brügger, 2007). The families are<br />
classified on <strong>the</strong> basis of <strong>the</strong>ir repeat sequences, leader<br />
region motifs, associated cas genes, and conserved<br />
d<strong>in</strong>ucleotide motifs adjo<strong>in</strong><strong>in</strong>g spacer match<strong>in</strong>g sequences<br />
on viruses and plasmids. Properties of transcripts from<br />
each strand of repeat-clusters <strong>in</strong> S. acidocaldarius chromosomes<br />
and <strong>the</strong> conjugative plasmid pKEF9 are exam<strong>in</strong>ed,<br />
as well as <strong>the</strong> possible formation of double-stranded<br />
spacer RNAs. Moreover, sequenc<strong>in</strong>g and bio<strong>in</strong>formatical<br />
analyses of <strong>the</strong> six repeat-clusters of Sulfolobus solfataricus<br />
stra<strong>in</strong>s P1 and P2 were performed and conclusions<br />
are drawn concern<strong>in</strong>g <strong>the</strong> dynamics of repeat-cluster<br />
development and functions of <strong>the</strong> different <strong>CRISPR</strong><br />
families.<br />
Results<br />
<strong>CRISPR</strong> families <strong>in</strong> Sulfolobales<br />
The repeat-clusters of <strong>the</strong> Sulfolobales are quite diverse<br />
structurally and we attempted to classify <strong>the</strong>m <strong>in</strong>to families<br />
on <strong>the</strong> basis of <strong>the</strong>ir repeat sequences, leader properties,<br />
associated cas genes and conserved sequences<br />
adjo<strong>in</strong><strong>in</strong>g spacer sequence matches on viruses/plasmids.<br />
A total of 48 complete and eight <strong>in</strong>complete repeatclusters<br />
were identified for <strong>the</strong> Sulfolobales, of which at<br />
least 51 carried putative leader sequences, and <strong>the</strong>y<br />
yielded 3685 spacer sequences. Phylogenetic tree build<strong>in</strong>g<br />
based on repeat sequences revealed three ma<strong>in</strong> families<br />
and some m<strong>in</strong>or ones where family I dom<strong>in</strong>ates<br />
(Fig. 1A). Each species typically carries representatives<br />
of two repeat families (Fig. 1A). Analyses of all cas genes<br />
associated with <strong>the</strong> repeat-clusters of <strong>the</strong> Sulfolobales<br />
re<strong>in</strong>forced <strong>the</strong> family divisions. A phylogenetic tree built<br />
from alignments of <strong>the</strong> most conserved cas1 gene, encod<strong>in</strong>g<br />
a predicted <strong>in</strong>tegrase or nuclease (Makarova et al.,<br />
2006), yielded essentially <strong>the</strong> same family tree as <strong>in</strong><br />
Fig. 1A (data not shown). Moreover, <strong>in</strong> an all-aga<strong>in</strong>st-all<br />
comparison of cas genes adjo<strong>in</strong><strong>in</strong>g repeat-clusters, each<br />
gene generally yielded best matches to o<strong>the</strong>r genes of <strong>the</strong><br />
same family, despite family I genes be<strong>in</strong>g overrepresentative<br />
(data not shown).<br />
For <strong>the</strong> leader regions, alignment of 300 bp of each<br />
sequence revealed a large fairly conserved downstream<br />
region which carries multiple dist<strong>in</strong>ct sequence motifs,<br />
most of which are specific for a given <strong>CRISPR</strong> family<br />
(Fig. 1B). Moreover, <strong>the</strong>se classes of motifs show significant<br />
levels of sequence conservation despite some of<br />
<strong>the</strong>m exhibit<strong>in</strong>g low sequence complexity. Of <strong>the</strong>se, motif A<br />
carries 70% aden<strong>in</strong>es, motif B exhibits 95% pur<strong>in</strong>es, motifs<br />
C, F, G and J conta<strong>in</strong> 50–60% thym<strong>in</strong>es, while motifs D, H<br />
and I are more complex. For all families some motifs are<br />
repeated (Fig. 1B). We <strong>in</strong>fer that <strong>the</strong>se motifs are likely to<br />
provide, directly or <strong>in</strong>directly, assembly sites for Cas prote<strong>in</strong>s<br />
<strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g RNA and/or <strong>in</strong> extend<strong>in</strong>g <strong>the</strong><br />
repeat-clusters. Lastly, exam<strong>in</strong>ation of spacer sequence<br />
matches on viruses/plasmids of <strong>the</strong> Sulfolobales revealed<br />
Fig. 1. Family classification of Sulfolobales <strong>CRISPR</strong>s.<br />
A. Phylogenetic tree based on a multiple alignment of repeat sequences show<strong>in</strong>g three ma<strong>in</strong> families I, II and III. <strong>CRISPR</strong>s are labelled by a<br />
four-letter prefix denot<strong>in</strong>g <strong>the</strong> species, and <strong>the</strong> number of repeats.<br />
B. Motif maps for leader regions of <strong>the</strong> three ma<strong>in</strong> <strong>CRISPR</strong> families. The motifs constitute conserved sequences, 30–100 bp <strong>in</strong> length,<br />
show<strong>in</strong>g on average 80% sequence identity. Sequence motifs A, B and C occur <strong>in</strong> more than one family [motif C occurs <strong>in</strong> some unclassified<br />
leaders (Fig. 1A)], whereas <strong>the</strong> o<strong>the</strong>r motifs are family specific. Thus motifs D, E, F and G occur only <strong>in</strong> family I leaders, motifs H and I are<br />
present only <strong>in</strong> family II leaders and motif J is exclusive to family III leaders. Leaders of each family show some variation <strong>in</strong> <strong>the</strong> number and<br />
order of <strong>the</strong> motifs present. Motif A overlaps with <strong>the</strong> transcriptional leader region.<br />
C. Logo-plot (http://weblogo.berkeley.edu/) of <strong>the</strong> motif located immediately upstream of <strong>the</strong> spacer match on viral/plasmid genomes where CC<br />
predom<strong>in</strong>ates <strong>in</strong> 129 matches for family I, TC <strong>in</strong> 23 matches for family II, and GT <strong>in</strong> 19 matches for family III <strong>CRISPR</strong>s, where one bit<br />
corresponds to about 75% presence and two bits correspond to 100%. The logo plots are based exclusively on spacer matches which show a<br />
maximum of five nucleotide mismatches.<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 261
262 R. K. Lillestøl et al. <br />
A<br />
Saci-133<br />
Saci_1871<br />
Saci-11<br />
Saci-5<br />
Saci_1974<br />
B<br />
- 6.5 kb<br />
L<br />
Saci_1975<br />
conserved upstream d<strong>in</strong>ucleotide motifs: CC for family I,<br />
TC for family II and GT for family III which may direct DNA<br />
<strong>in</strong>corporation <strong>in</strong>to <strong>CRISPR</strong>s (Fig. 1C). These may constitute<br />
an archaeal parallel to <strong>the</strong> AGAAA and GGNG motifs<br />
located downstream from bacterial proto-spacers of<br />
S. <strong>the</strong>rmophilus (Horvath et al., 2008a).<br />
Genome contexts of <strong>the</strong> repeat-clusters<br />
L<br />
cas1 cas4<br />
L<br />
Saci-2<br />
cas3<br />
csa3 Saci_2016<br />
Repeat cluster Repeat sequence<br />
The S. acidocaldarius chromosome carries five repeatclusters<br />
with 133, 78, 11, 5 and 2 repeats which fall <strong>in</strong>to<br />
<strong>CRISPR</strong> families II and III (Fig. 1A). Saci-133 and Saci-78<br />
(family III) are physically l<strong>in</strong>ked, with shared cas genes.<br />
They exhibit 95% identical leader sequences adjo<strong>in</strong><strong>in</strong>g<br />
<strong>the</strong> first repeat and carry identical, non-pal<strong>in</strong>dromic<br />
repeats (Fig. 2A and B). Saci-11 and Saci-2 (family II) are<br />
physically l<strong>in</strong>ked by cas genes (Fig. 2A) and carry identi-<br />
L<br />
cas-genes<br />
8.13 kb<br />
pKEF-7<br />
Saci-78<br />
CAG38159 CAG38160<br />
Saci-133/78 GTAATAACGACAAGAAACTAAAAC<br />
Saci-11/2 GATGAATCCCAAAAGGGATTGAAAG<br />
Saci-5 A T<br />
pKEF-7 GTTGCAATTCCCTAAATGTGCGGG<br />
L<br />
- 3.8 kb<br />
cas1 Saci_1882<br />
Fig. 2. A. Diagram show<strong>in</strong>g <strong>the</strong> genomic context of <strong>the</strong> S. acidocaldarius repeat-clusters, and of <strong>the</strong> pKEF9 cluster. Saci-133 and Saci-78<br />
are physically l<strong>in</strong>ked on <strong>the</strong> chromosome, as are Saci-11 and Saci-2. Saci-5 and <strong>the</strong> plasmid cluster pKEF-7 are separate. L denotes <strong>the</strong><br />
leader region. Identities of genes border<strong>in</strong>g <strong>the</strong> clusters or <strong>the</strong>ir GenBank/EMBL assignments are given and <strong>the</strong>ir directions of transcription are<br />
<strong>in</strong>dicated.<br />
B. Repeat sequences where <strong>in</strong>verted repeat sequences are underl<strong>in</strong>ed, and experimentally identified process<strong>in</strong>g sites are marked with ‘ ’s.<br />
1 kb<br />
cal leader sequences while <strong>the</strong> more divergent Saci-5<br />
(family II) exhibits a repeat with two base pair changes<br />
and a leader sequence show<strong>in</strong>g 75% sequence identity<br />
(Chen et al., 2005; Lillestøl et al., 2006). All of <strong>the</strong> family II<br />
repeats carry a 5 bp <strong>in</strong>verted repeat (Fig. 2B). Repeatclusters<br />
Saci-5 and Saci-2 each carry a degenerate<br />
repeat, distal to <strong>the</strong> leader region. The repeat-cluster<br />
(pKEF-7) of conjugative plasmid pKEF9 carries no leader<br />
sequence and no associated cas genes (Fig. 2A) (Greve<br />
et al., 2004).<br />
Repeat-clusters generate s<strong>in</strong>gle transcripts cover<strong>in</strong>g <strong>the</strong><br />
whole cluster<br />
In order to <strong>in</strong>vestigate transcripts formed dur<strong>in</strong>g <strong>the</strong><br />
growth cycle, RNA was extracted from S. acidocaldarius,<br />
and from S. solfataricus P2 conjugated with pKEF9, har-<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
exp. stat.<br />
1 2 3 4<br />
vested at different stages of exponential growth and,<br />
for <strong>the</strong> former, stationary phase. Oligonucleotide probes<br />
complementary to spacers of <strong>the</strong> repeat-clusters were<br />
tested <strong>in</strong> Nor<strong>the</strong>rn blot analyses. Initially, Saci-78 transcripts<br />
were probed for spacer 4, adjacent to <strong>the</strong> leader<br />
region, and <strong>the</strong> results demonstrate that process<strong>in</strong>g<br />
<strong>in</strong>creased progressively as stationary phase was<br />
approached (Fig. 3). The maximum transcript size, about<br />
5000 nt, exceeds <strong>the</strong> size of <strong>the</strong> 4624 bp repeat-cluster,<br />
<strong>in</strong>dicat<strong>in</strong>g that <strong>the</strong> whole cluster was transcribed (Fig. 3).<br />
However, <strong>the</strong> majority of detected transcripts fall <strong>in</strong> <strong>the</strong><br />
size range 3000–3500 nt suggest<strong>in</strong>g that endogenous<br />
degradation, process<strong>in</strong>g or premature term<strong>in</strong>ation had<br />
occurred towards <strong>the</strong> 3′-end. Evidence for <strong>the</strong> formation of<br />
whole transcripts was also obta<strong>in</strong>ed for each of <strong>the</strong> small<br />
repeat-clusters Saci-5 (Lillestøl et al., 2006), Saci-11 and<br />
Saci-2 (data not shown), and pKEF-7 (see below).<br />
Transcription from <strong>the</strong> leader strand<br />
6.0<br />
5.0<br />
4.0<br />
3.0<br />
2.5<br />
2.0<br />
1.5<br />
1.0<br />
Fig. 3. Nor<strong>the</strong>rn blot of Saci-78 transcripts us<strong>in</strong>g an<br />
oligonucleotide probe aga<strong>in</strong>st spacer 4. Ten microgram RNA was<br />
isolated from S. acidocaldarius cells harvested at: (1) early log<br />
phase, (2) late log phase, (3) early stationary phase and (4) late<br />
stationary phase. RNA size markers (0.5–9 kb) were<br />
co-electrophoresed and excised from <strong>the</strong> gel prior to RNA blott<strong>in</strong>g.<br />
In order to test whe<strong>the</strong>r transcription <strong>in</strong>itiated at s<strong>in</strong>gle or<br />
multiple sites, we determ<strong>in</strong>ed start sites at <strong>the</strong> leader of<br />
Saci-133 and Saci-5 by identify<strong>in</strong>g RNA fragments carry<strong>in</strong>g<br />
5′-term<strong>in</strong>al triphosphates us<strong>in</strong>g tobacco acid phos-<br />
M<br />
phatase <strong>in</strong> 5′-RLM RACE procedures. The results<br />
demonstrate that start sites occur immediately upstream<br />
from <strong>the</strong> first repeat sequence for both repeat-clusters<br />
(Fig. 4A and B) and <strong>the</strong> start sites are preceded upstream<br />
by archaeal BRE/TATA motifs (Torar<strong>in</strong>sson et al., 2005).<br />
For Saci-133, transcription <strong>in</strong>itiated at <strong>the</strong> sequence<br />
GATGG, 17 nt upstream from <strong>the</strong> first repeat (Fig. 4A;<br />
Table 1). An identical sequence/motif pattern occurs for<br />
Saci-78. A different pattern was found for <strong>the</strong> family II<br />
clusters Saci-11, Saci-5 and Saci-2 where transcription<br />
<strong>in</strong>itiates at <strong>the</strong> sequence AAGGG, 21 nt upstream from<br />
<strong>the</strong> first repeat and is also preceded by archaeal promoter<br />
motifs (Fig. 4B; Table 1).<br />
We probed for transcripts <strong>in</strong>itiat<strong>in</strong>g at <strong>the</strong> leader of Saci-<br />
133 us<strong>in</strong>g oligonucleotides complementary to spacers 5,<br />
6, 59 and 131. Strong signals were obta<strong>in</strong>ed for each<br />
spacer (Fig. 5A) consistent with <strong>the</strong> whole cluster be<strong>in</strong>g<br />
transcribed <strong>in</strong> fairly high yield, as was demonstrated for<br />
Saci-78 (Fig. 3). The low level of larger transcripts<br />
detected with probes aga<strong>in</strong>st spacers 59 and 131 sug-<br />
A Saci-133<br />
M - +<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
0<br />
500<br />
400<br />
300<br />
200<br />
100<br />
Start<br />
4(9)<br />
5(3)<br />
5(23)<br />
6(3)<br />
6(15)<br />
B Saci-5<br />
500<br />
400<br />
300<br />
200<br />
100<br />
M - +<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 263<br />
Start<br />
1(8)<br />
1(17)<br />
C Saci-133<br />
0<br />
500<br />
400<br />
300<br />
200<br />
100<br />
M<br />
133(10)<br />
132(11)<br />
131(22)<br />
Fig. 4. Determ<strong>in</strong>ation of <strong>the</strong> transcriptional start sites, and<br />
process<strong>in</strong>g sites of RNA products generated from Saci-133 and<br />
Saci-5 us<strong>in</strong>g <strong>the</strong> 5′-RLM RACE and 3′-RLM RACE procedures.<br />
A. Determ<strong>in</strong>ation of 5′-ends of transcripts from Saci-133 where<br />
RNA was treated with (+) and without (-) tobacco acid phosphatase<br />
to remove 5′-phosphates from <strong>the</strong> 5′-end of <strong>the</strong> <strong>in</strong>itial transcript,<br />
and an oligonucleotide primer specific for spacer 7 was employed.<br />
Bands exclusive to <strong>the</strong> + lane reta<strong>in</strong> <strong>the</strong> transcriptional start site<br />
whereas bands present <strong>in</strong> both + and - lanes represent transcripts<br />
which have been processed at <strong>the</strong> 5′-end.<br />
B. Determ<strong>in</strong>ation of 5′-ends of transcripts from Saci-5 us<strong>in</strong>g an<br />
oligonucleotide primer specific for spacer 1. The band show<strong>in</strong>g <strong>in</strong><br />
<strong>the</strong> + lane of about 160 bp is an artefact, where sequenc<strong>in</strong>g<br />
revealed that two adapters had ligated to each o<strong>the</strong>r and to <strong>the</strong><br />
start site.<br />
C. 3′-RLM RACE experiment performed on transcripts from<br />
Saci-133 us<strong>in</strong>g a primer specific for spacer 130. The three bands<br />
represent transcripts which have been processed with<strong>in</strong> <strong>the</strong><br />
term<strong>in</strong>al repeats 131, 132 and 133. In each experiment process<strong>in</strong>g<br />
sites are <strong>in</strong>dicated by number of <strong>the</strong> repeat (from <strong>the</strong> leader) where<br />
<strong>the</strong> position of <strong>the</strong> 5′-nucleotide with<strong>in</strong> <strong>the</strong> repeat is given <strong>in</strong><br />
brackets.
264 R. K. Lillestøl et al. <br />
Table 1. Overview of promoters, transcriptional start sites and process<strong>in</strong>g sites <strong>in</strong> repeat-clusters Saci-133, Saci-5 and pKEF-7 identified by <strong>the</strong><br />
5′-RLM RACE method.<br />
Cluster BRE-TATA Start<br />
gests that transcript process<strong>in</strong>g occurs primarily from <strong>the</strong><br />
3′-end (Fig. 5A). Prob<strong>in</strong>g of <strong>the</strong> repeat also revealed a<br />
similar series of bands except for <strong>the</strong> smallest RNAs<br />
(Fig. 5A).<br />
Distance from<br />
first repeat Process<strong>in</strong>g sites<br />
Saci-133 GAAAATATTTATAAA GATGG +17 nt 4 (9), 5 (3), 5 (23), 6 (3), 6 (18)<br />
Saci-5 GCAAAAGTTTATTAA AAGGG +21 nt 1 (8), 1 (17)<br />
pKEF-7 GAAAAAGTTTATTA AATCT +32 nt +23, 1 (24)<br />
Putative BRE and TATA motif sequences are located approximately 25 bp upstream from transcription start sites and <strong>the</strong> process<strong>in</strong>g sites with<strong>in</strong><br />
<strong>the</strong> repeats (numbered from <strong>the</strong> leader region) give <strong>the</strong> position of <strong>the</strong> 5′-nucleotide <strong>in</strong> brackets.<br />
A<br />
150<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
M1 repeat 133(5) 133(6) 133(59) 133(131) M2<br />
B<br />
M1 repeat 133(5) 133(6) 133(60) 133(131) M2<br />
150<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
30<br />
500<br />
400<br />
300<br />
200<br />
100<br />
40-50<br />
500<br />
400<br />
300<br />
200<br />
100<br />
50-60<br />
Fig. 5. Nor<strong>the</strong>rn blot analyses of Saci-133 transcripts. The repeat<br />
sequence and spacers at positions 5 (37 nt), 6 (42 nt), 59 (41 nt)<br />
and 131 (36 nt) from <strong>the</strong> leader, were probed with oligonucleotides<br />
to detect: (A) transcripts <strong>in</strong>itiat<strong>in</strong>g with<strong>in</strong> <strong>the</strong> leader sequence, and<br />
(B) transcripts generated from <strong>the</strong> complementary strand. Twenty<br />
microgram RNA was isolated from cells grown to stationary phase.<br />
RNA size markers of 10–150 nt (M1) and 100–2000 nt (M2) are<br />
aligned approximately with <strong>the</strong> transcript lanes.<br />
The larger products observed with both spacer and<br />
repeat probes correspond <strong>in</strong> size to transcripts of multiple<br />
repeat-spacer units (Fig. 5A) whereas <strong>the</strong> smallest products<br />
seen when us<strong>in</strong>g spacer probes range <strong>in</strong> size from a<br />
s<strong>in</strong>gle repeat-spacer unit (62–68 nt) to <strong>the</strong> spacer (40 nt)<br />
suggest<strong>in</strong>g that progressive exoribonuclease trimm<strong>in</strong>g<br />
occurs with<strong>in</strong> repeats flank<strong>in</strong>g <strong>the</strong> spacer, consistent with<br />
<strong>the</strong> <strong>in</strong>ability to detect <strong>the</strong> smallest RNAs with <strong>the</strong> repeat<br />
probe (Fig. 5A) and <strong>the</strong> earlier observation for Saci-5<br />
(Lillestøl et al., 2006). Saci-11 and Saci-2 were also<br />
probed with spacer-specific oligonucleotides, and Nor<strong>the</strong>rn<br />
blots yielded closely comparable patterns (data not<br />
shown).<br />
5′-RLM RACE analyses of Saci-133 and Saci-5 also<br />
revealed process<strong>in</strong>g sites (Fig. 4A and B). Multiple process<strong>in</strong>g<br />
sites were identified throughout <strong>the</strong> repeats for<br />
Saci-133 but conf<strong>in</strong>ed to <strong>the</strong> <strong>in</strong>verted repeat for Saci-5 at<br />
positions 8 and 17 (Table 1; Fig. 2B).<br />
3′-Term<strong>in</strong>i of Saci-133 transcripts were also determ<strong>in</strong>ed<br />
by <strong>the</strong> 3′-RLM RACE method employ<strong>in</strong>g a probe aga<strong>in</strong>st<br />
spacer 130. Three ma<strong>in</strong> bands were produced which,<br />
on sequenc<strong>in</strong>g, revealed process<strong>in</strong>g sites distributed<br />
throughout term<strong>in</strong>al repeats 131, 132 and 133, at positions<br />
10, 11 and 22 (Fig. 4C). The absence of fur<strong>the</strong>r<br />
downstream bands suggested that <strong>the</strong> transcript term<strong>in</strong>us<br />
had been efficiently excised.<br />
In order to confirm that process<strong>in</strong>g occurred exclusively<br />
with<strong>in</strong> repeat sequences, Saci-133 transcripts on<br />
one membrane were probed, successively, by spacer<br />
5-specific, and <strong>the</strong>n repeat-specific, probes. Both probes<br />
yielded similar patterns for <strong>the</strong> larger transcripts but <strong>the</strong><br />
smallest transcripts were only detected with spacerspecific<br />
probes (Fig. 5A). Thus, <strong>the</strong> f<strong>in</strong>al process<strong>in</strong>g step<br />
occurs <strong>in</strong> <strong>the</strong> repeat leav<strong>in</strong>g <strong>the</strong> spacer <strong>in</strong>tact.<br />
Complementary strand is transcribed<br />
In a prelim<strong>in</strong>ary experiment, we demonstrated that Saci-5<br />
transcripts are produced from both DNA strands (Lillestøl<br />
et al., 2006). As this raised <strong>the</strong> possibility that dsRNA<br />
<strong>in</strong>termediates could be formed, we studied <strong>the</strong>se effects<br />
more <strong>system</strong>atically for Saci-133 Saci-78, Saci-11, Saci-5<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
and Saci-2. Transcripts from <strong>the</strong> complementary DNA<br />
strand of Saci-133 were probed for spacers 5, 6, 60 and<br />
131 (numbered from <strong>the</strong> leader). Each showed strong<br />
signals (Fig. 5B) but <strong>the</strong>y differed from those of leader<br />
strand transcripts <strong>in</strong> that products were less regular <strong>in</strong> size<br />
and larger transcripts prevailed. Never<strong>the</strong>less, <strong>the</strong><br />
smallest product for each spacer probe was a discrete<br />
band of about 55 nt (Fig. 5B). Similarly sized RNAs<br />
were observed when prob<strong>in</strong>g each of <strong>the</strong> o<strong>the</strong>r four<br />
S. acidocaldarius repeat-clusters (data not shown), consistent<br />
with <strong>the</strong> earlier observation for Saci-5 (Lillestøl<br />
et al., 2006). These small RNAs must conta<strong>in</strong> all or most<br />
of <strong>the</strong> spacer sequence because <strong>the</strong> strong band<br />
observed with each spacer probe was not detected with a<br />
repeat probe (Fig. 5B).<br />
Nor<strong>the</strong>rn analyses of each of <strong>the</strong> chromosomal repeatclusters<br />
<strong>in</strong>dicated strong signals for all tested spacer<br />
probes, <strong>in</strong>dicat<strong>in</strong>g that transcription from <strong>the</strong> complementary<br />
strand occurred throughout each cluster, as is illustrated<br />
for Saci-133 (Fig. 5B). This result was re<strong>in</strong>forced by<br />
a Nor<strong>the</strong>rn blot analysis <strong>in</strong> which <strong>the</strong> complementary<br />
strand transcripts from <strong>the</strong> Saci-5 cluster were probed for<br />
spacer 1, adjacent to <strong>the</strong> leader region, and <strong>the</strong> largest<br />
transcript (430 nt) exceeds <strong>the</strong> m<strong>in</strong>imal size of <strong>the</strong> repeatcluster<br />
(300 bp) (Fig. 6). Moreover, each of <strong>the</strong> five clusters<br />
carries at least one putative promoter BRE/TATA<br />
motif with<strong>in</strong> 50 bp of <strong>the</strong> term<strong>in</strong>al repeat of <strong>the</strong> repeatcluster.<br />
In addition, <strong>the</strong>re are no open read<strong>in</strong>g frames<br />
(ORFs) with<strong>in</strong> at least 3 kb of <strong>the</strong> putative promoters, on<br />
500<br />
400<br />
300<br />
200<br />
100<br />
M Saci-5 0<br />
430<br />
190<br />
Fig. 6. Nor<strong>the</strong>rn blot analysis of transcripts from <strong>the</strong><br />
complementary strand of Saci-5, prob<strong>in</strong>g for spacer 1, adjacent to<br />
<strong>the</strong> leader region. RNA size markers of 100–2000 nt (M1) are<br />
aligned approximately with <strong>the</strong> transcript lanes. The size of <strong>the</strong><br />
smallest transcript was estimated us<strong>in</strong>g an <strong>in</strong>dependent,<br />
co-electrophoesed, size maker as shown <strong>in</strong> Fig. 5.<br />
52<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 265<br />
<strong>the</strong> complementary DNA strand, for any chromosomal<br />
repeat-clusters, except Saci-2 (Fig. 2B).<br />
A comparison of transcript yields from <strong>the</strong> leader and<br />
complementary strands <strong>in</strong>dicated qualitatively similar<br />
expression levels from both strands of Saci-133 (Fig. 5A<br />
and B). This is difficult to quantify accurately because of<br />
<strong>the</strong> complexity and diversity of <strong>the</strong> RNA fragment patterns<br />
(Fig. 5A and B) but qualitatively similar transcription levels<br />
were observed for all five repeat-clusters.<br />
The possibility that functional dsRNAs were generated<br />
between spacer transcripts from each DNA strand was<br />
tested by a ribonuclease digestion approach us<strong>in</strong>g<br />
ssRNA-specific enzymes RNase T1 and RNase U2 which<br />
cleave preferentially 3′- to G and A residues respectively,<br />
but do not cleave regular dsRNA (Christiansen et al.,<br />
1990). Total RNA from S. acidocaldarius was treated with<br />
<strong>in</strong>creas<strong>in</strong>g concentrations of each ribonuclease and<br />
Nor<strong>the</strong>rn blots were obta<strong>in</strong>ed by prob<strong>in</strong>g for spacer 6 of<br />
Saci-133 transcripts from each strand. The results<br />
revealed progressive cleavage of both <strong>the</strong> leader and<br />
complementary strand transcripts at <strong>in</strong>creas<strong>in</strong>g ribonuclease<br />
concentrations but no resistant dsRNA band of<br />
about 40 bp was detected (data not shown). This may<br />
reflect that specific prote<strong>in</strong>–ssRNA complexes form as<br />
was shown for <strong>the</strong> lead<strong>in</strong>g strand spacer RNA of<br />
P. furiosus (Hale et al., 2008).<br />
pKEF-7 transcripts are processed <strong>in</strong> both repeats<br />
and spacers<br />
Despite its lack of associated cas genes and leader<br />
region, we considered <strong>the</strong> pKEF9 repeat-cluster to be a<br />
<strong>CRISPR</strong> <strong>system</strong> because three of <strong>the</strong> six spacers match<br />
to Sulfolobus viruses, spacer 3 to rudiviruses and<br />
spacers 5 and 6 to fuselloviruses. This is consistent with<br />
<strong>the</strong> conjugative plasmid regulat<strong>in</strong>g <strong>the</strong> viruses <strong>in</strong>tracellularly.<br />
Therefore, RNA was isolated from S. solfataricus<br />
P2 14 h after conjugat<strong>in</strong>g with pKEF9 before plasmid<br />
levels rapidly decl<strong>in</strong>e. For <strong>the</strong> predicted leader strand<br />
(Fig. 2A), 5′-ends were determ<strong>in</strong>ed by 5′-RLM RACE<br />
analyses us<strong>in</strong>g a primer specific for spacer 1. The<br />
results revealed a s<strong>in</strong>gle transcript start site, 32 nt<br />
upstream from <strong>the</strong> first repeat, preceded by promoter<br />
motifs (Table 1). Process<strong>in</strong>g sites were also identified<br />
23 nt upstream from <strong>the</strong> first repeat, and at <strong>the</strong> junction<br />
of <strong>the</strong> first repeat and spacer (Fig. 7A; Table 1). Nor<strong>the</strong>rn<br />
blott<strong>in</strong>g experiments were <strong>the</strong>n performed prob<strong>in</strong>g each<br />
half of each spacer sequence, as well as <strong>the</strong> repeat<br />
(Fig. 7B). The results for each probe revealed a largest<br />
product of about 465 nt, correspond<strong>in</strong>g <strong>in</strong> size to a transcript<br />
from <strong>the</strong> whole repeat-cluster. The transcript patterns<br />
<strong>in</strong>dicated that smaller products disappeared,<br />
stepwise, as one probed along <strong>the</strong> transcript <strong>in</strong> a 5′ to 3′<br />
direction (Fig. 7B). The experiment was repeated, after
266 R. K. Lillestøl et al. <br />
A<br />
400<br />
300<br />
200<br />
100<br />
M - +<br />
B<br />
465<br />
410<br />
345<br />
285<br />
245<br />
210<br />
183<br />
Start 165<br />
+32<br />
148<br />
+23<br />
145<br />
139<br />
1(24)<br />
104<br />
95<br />
Fig. 7. Transcription from <strong>the</strong> pKEF-7 cluster.<br />
A. 5′-RLM RACE analyses of <strong>the</strong> transcriptional start site and process<strong>in</strong>g sites near <strong>the</strong> start of <strong>the</strong> transcript. RNA was treated with (+) and<br />
without (-) tobacco acid phosphatase.<br />
B. Nor<strong>the</strong>rn blot analyses of transcripts from <strong>the</strong> pKEF-7 cluster us<strong>in</strong>g oligonucleotide probes specific for <strong>the</strong> left (L) and right (R) halves of<br />
spacers 1, 2, 3, 4, 5, 6 and <strong>the</strong> repeat sequence respectively.<br />
C. Nor<strong>the</strong>rn blot analyses of transcripts from <strong>the</strong> complementary strand from <strong>the</strong> pKEF-7 cluster us<strong>in</strong>g oligonucleotide probes specific for<br />
spacer 6 and <strong>the</strong> repeat sequence. RNA was isolated 14 h after conjugation <strong>in</strong> A, B and C. RNA size markers of 100–500 nt (M) are aligned<br />
and approximate fragment sizes are given.<br />
conjugat<strong>in</strong>g S. solfataricus P2 for 20 h, when <strong>the</strong> smaller<br />
transcripts observed for spacer 1 were also seen for <strong>the</strong><br />
o<strong>the</strong>r spacers, consistent with <strong>in</strong>creased process<strong>in</strong>g<br />
hav<strong>in</strong>g occurred as stationary phase was approached<br />
(data not shown).<br />
The transcript patterns are complicated by <strong>the</strong> presence<br />
of sets of weak and strong signals (Fig. 7B) where <strong>the</strong><br />
former match those of <strong>the</strong> S. acidocaldarius clusters<br />
(above) <strong>in</strong> size and putative process<strong>in</strong>g <strong>in</strong> repeats<br />
(Table 2). For example, <strong>the</strong> 95 nt and 104 nt transcripts<br />
observed for <strong>the</strong> spacer 1 probe are consistent <strong>in</strong> size with<br />
<strong>the</strong>ir extend<strong>in</strong>g from <strong>the</strong> start site, or process<strong>in</strong>g site 9 nt<br />
downstream (Fig. 7A, Table 1), to a process<strong>in</strong>g site <strong>in</strong><br />
repeat 2 (Table 2). For <strong>the</strong> stronger transcripts, differences<br />
were observed when prob<strong>in</strong>g each half of <strong>the</strong><br />
spacer transcripts (Fig. 7B). Probes upstream halves (L)<br />
revealed smaller fragments than prob<strong>in</strong>g downstream<br />
halves (R), seen most dramatically for probes aga<strong>in</strong>st<br />
spacer 2 (139–148 nt) and spacer 3 (210 nt) (Fig. 7B).<br />
The strong transcripts are consistent <strong>in</strong> size with <strong>the</strong>ir<br />
extend<strong>in</strong>g from <strong>the</strong> <strong>in</strong>itiation, or downstream process<strong>in</strong>g,<br />
site to <strong>the</strong> downstream (R) spacer halves (Table 2).<br />
The repeat probe yielded a similar transcript pattern as<br />
for spacer 2 and differed from that for spacer 1 (Fig. 7B)<br />
probably because of non-anneal<strong>in</strong>g of <strong>the</strong> primer to <strong>the</strong><br />
degenerate first repeat. This was re<strong>in</strong>forced by <strong>the</strong> lack of<br />
process<strong>in</strong>g <strong>in</strong> <strong>the</strong> first repeat sequence, as determ<strong>in</strong>ed by<br />
<strong>the</strong> 5′-RLM RACE method (Fig. 7A). Transcripts from <strong>the</strong><br />
complementary DNA strand were also detected prob<strong>in</strong>g<br />
for spacer 6 and <strong>the</strong> repeat sequence (Fig. 7C), and tran-<br />
500<br />
400<br />
300<br />
200<br />
100<br />
scripts <strong>in</strong> <strong>the</strong> size range 185–480 nt were observed for <strong>the</strong><br />
former and 145–480 nt for <strong>the</strong> latter, similar <strong>in</strong> size to<br />
transcripts observed from <strong>the</strong> leader strand. The absence<br />
of spacer-sized RNAs from ei<strong>the</strong>r DNA strand could<br />
reflect that <strong>the</strong> f<strong>in</strong>al RNA process<strong>in</strong>g enzymes are activated<br />
ma<strong>in</strong>ly <strong>in</strong> stationary phase (Fig. 3) or <strong>in</strong>compatibility<br />
M1L 1R 2L 2R 3L 3R 4L 4R 5L 5R 6L 6R Rep<br />
C<br />
6<br />
Rep<br />
M<br />
500<br />
400<br />
300<br />
200<br />
100<br />
Table 2. Summary of transcriptional start sites and estimated sizes<br />
and process<strong>in</strong>g sites for transcripts deriv<strong>in</strong>g from pKEF-7 as illustrated<br />
<strong>in</strong> Fig. 7B.<br />
Transcript Start Stop<br />
Weak (normal) Repeat/(position)<br />
95/104 +23/+32 2 (9)<br />
165/183 +23/+32 3 (22)<br />
245 +23/+32 4 (19)<br />
Strong (abnormal) Spacer (position)<br />
139/148 +23/+32 2 (29)<br />
210 +23/+32 3 (25)<br />
285 +23/+32 4 (35)<br />
345 +23/+32 5 (23)<br />
410 +23/+32 6 (30)<br />
Weak (abnormal)<br />
145 1 (24) 3 (16)<br />
Transcripts <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> normal/weak category appear to be<br />
processed <strong>in</strong> <strong>the</strong> same manner as <strong>the</strong> S. acidocaldarius clusters.<br />
Process<strong>in</strong>g sites are localized by <strong>the</strong> repeat number and <strong>the</strong><br />
estimated nucleotide position <strong>in</strong> brackets. Transcripts <strong>in</strong> <strong>the</strong> abnormal<br />
category are processed <strong>in</strong> <strong>the</strong> right half of spacers (position denoted<br />
by <strong>the</strong> spacer number and <strong>the</strong> estimated nucleotide position <strong>in</strong><br />
brackets). +32 denotes <strong>the</strong> transcriptional <strong>in</strong>itiation site and +23 and<br />
1 (24) <strong>in</strong>dicates process<strong>in</strong>g sites identified by <strong>the</strong> 5′-RLM RACE<br />
method.<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
Fig. 8. Patterns of repeat-spacer units <strong>in</strong> repeat-clusters A–F of S. solfataricus stra<strong>in</strong> P1 are aligned with those from S. solfataricus stra<strong>in</strong> P2<br />
(She et al., 2001), where each arrowhead represents a s<strong>in</strong>gle spacer-repeat unit, and <strong>the</strong> number to <strong>the</strong> right <strong>in</strong>dicates <strong>the</strong> total number of<br />
units. Grey boxed regions <strong>in</strong>dicate sequences that are identical for a given pair of clusters. Blackened units lie with<strong>in</strong> <strong>the</strong>se conserved regions<br />
but yield no matches to viruses/plasmids. Spacers which yield significant matches to viruses or plasmids are colour-coded as <strong>in</strong>dicated on <strong>the</strong><br />
figure. Boxes to <strong>the</strong> left of <strong>the</strong> clusters represent leader regions that are coloured accord<strong>in</strong>g to <strong>the</strong> leader family, blue – family I, purple –<br />
family II (Fig. 1B). The larger arrowhead <strong>in</strong> cluster D of stra<strong>in</strong> P1 represents a 899 bp pNOB8-like fragment, and <strong>the</strong> large arrowhead <strong>in</strong> cluster<br />
F denotes a 106 bp <strong>in</strong>sert with two atypical repeat sequences and abnormal spacer regions. Prelim<strong>in</strong>ary data on clusters B, C and E were<br />
presented earlier (Lillestøl et al., 2006).<br />
between process<strong>in</strong>g enzymes and <strong>the</strong> plasmid repeat<br />
sequence (Carte et al., 2008).<br />
Functional properties of <strong>the</strong> <strong>CRISPR</strong> families<br />
For S. acidocaldarius <strong>the</strong> 297 spacer sequences yield<br />
only 44 (15%) significant matches to virus/plasmid<br />
sequences, relatively few compared with up to 40% for<br />
o<strong>the</strong>r Sulfolobales genomes (Lillestøl et al., 2006; Shah<br />
et al., 2009). Therefore, to ga<strong>in</strong> more <strong>in</strong>sight <strong>in</strong>to <strong>the</strong> functional<br />
diversity of different <strong>CRISPR</strong> families, we completed<br />
<strong>the</strong> sequenc<strong>in</strong>g of <strong>the</strong> six repeat-clusters A–F of<br />
S. solfataricus stra<strong>in</strong> P1 because, although repeatclusters<br />
B, C and E share regions of perfectly conserved<br />
spacer-repeat sequences with S. solfataricus stra<strong>in</strong> P2,<br />
<strong>the</strong>y also yielded many additional virus/plasmid sequence<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 267<br />
matches (Lillestøl et al., 2006). The primary structures of<br />
repeat-clusters A–F of stra<strong>in</strong> P1 are displayed toge<strong>the</strong>r<br />
with those of stra<strong>in</strong> P2 (She et al., 2001) <strong>in</strong> Fig. 8, where<br />
<strong>the</strong> locations and distributions of virus/plasmid matches<br />
are <strong>in</strong>dicated.<br />
Repeat-clusters A and B represent family II <strong>CRISPR</strong>s,<br />
while C, D, E and F belong to family I (Fig. 1A). Each<br />
repeat-cluster of stra<strong>in</strong>s P1 and P2 shares identical<br />
regions of sequence enclosed <strong>in</strong> grey boxes (Fig. 8).<br />
While cluster pairs E and F are identical, <strong>the</strong> o<strong>the</strong>rs all<br />
show evidence of repeat-spacer units hav<strong>in</strong>g been added<br />
at <strong>the</strong> leader region, after separation of <strong>the</strong> stra<strong>in</strong>s,<br />
although <strong>the</strong> repeat-cluster sizes and apparent rates of<br />
extension differ greatly. Repeat-clusters B from stra<strong>in</strong> P1<br />
and D from stra<strong>in</strong> P2 show evidence of putative deletions<br />
of 21 and 45 repeat spacer units respectively, and <strong>the</strong>re
268 R. K. Lillestøl et al. <br />
Fig. 9. Pie plots for <strong>the</strong> ma<strong>in</strong> <strong>CRISPR</strong> families I, II and III of <strong>the</strong> Sulfolobales where <strong>the</strong> percentage of spacer sequence matches are given<br />
for <strong>the</strong> different crenarchaeal viral families and plasmid classes which are colour-coded. Spacer matches <strong>in</strong>vestigated for each family: family I<br />
(2031 spacers tested, 771 significant matches), family II (710 spacers tested, 230 significant matches) and family III (298 spacers tested, 88<br />
significant matches).<br />
are m<strong>in</strong>or differences with<strong>in</strong> <strong>the</strong> conserved regions of<br />
cluster A. Moreover, cluster A shares a sequence of four<br />
repeat-spacer units with cluster B of stra<strong>in</strong> P1, suggest<strong>in</strong>g<br />
that homologous recomb<strong>in</strong>ation has occurred between<br />
different clusters of <strong>the</strong> same family. Importantly, <strong>the</strong><br />
downstream ends of each pair of repeat-clusters are conserved<br />
which suggests that <strong>the</strong> clusters lose <strong>the</strong>ir repeatspacer<br />
units primarily by <strong>in</strong>ternal deletions (Fig. 8).<br />
Cluster D of stra<strong>in</strong> P1 and cluster F of both stra<strong>in</strong>s carry<br />
anomalous <strong>in</strong>serts. The former is an 899 bp region<br />
show<strong>in</strong>g a significant sequence match to <strong>the</strong> conjugative<br />
plasmid pNOB8 (She et al., 1998) while <strong>the</strong> latter region<br />
carries a degenerate repeat-spacer region with a different<br />
repeat sequence and an abnormally sized spacer, possibly<br />
also of plasmid orig<strong>in</strong>.<br />
The absence of newly added repeat-spacer units to<br />
cluster F, and <strong>the</strong> lack of a leader region (Fig. 8), raised<br />
<strong>the</strong> question as to whe<strong>the</strong>r <strong>the</strong> cluster was active. Therefore,<br />
we probed for spacer 11 of cluster F of stra<strong>in</strong> P2<br />
us<strong>in</strong>g a Nor<strong>the</strong>rn blot analysis. A similar fragment pattern<br />
was obta<strong>in</strong>ed as for <strong>the</strong> S. acidocaldarius clusters<br />
(Fig. 5A) except that small spacer RNAs (< 66 nt) were<br />
absent (data not shown). This <strong>in</strong>dicated, as for pKEF-7<br />
(Fig. 7B), a defective f<strong>in</strong>al process<strong>in</strong>g stage which could<br />
be caused by <strong>the</strong> lack of a leader region and/or <strong>the</strong><br />
absence of some physically l<strong>in</strong>ked cas genes.<br />
The number of significant spacer matches to viruses/<br />
plasmids was 39% and 38% for stra<strong>in</strong>s P1 and P2 respectively,<br />
which carry a total of 431 and 417 spacers<br />
respectively. The colour cod<strong>in</strong>g of <strong>the</strong> matches (Fig. 8)<br />
reveals some apparent biases. For example, <strong>the</strong>re is a<br />
high proportion of bicaudaviral matches <strong>in</strong> <strong>the</strong> newly<br />
added spacers of cluster D (family I), for both stra<strong>in</strong>s,<br />
which contrasts with <strong>the</strong> high proportion of rudiviral<br />
matches <strong>in</strong> clusters A and B (family II) and suggests that<br />
<strong>in</strong>dividual <strong>CRISPR</strong> families exhibit a preference for certa<strong>in</strong><br />
extra-chromosomal elements. To test this hypo<strong>the</strong>sis<br />
fur<strong>the</strong>r, data for significant spacer sequence matches to<br />
viruses/plasmids for <strong>the</strong> three ma<strong>in</strong> <strong>CRISPR</strong> families of<br />
<strong>the</strong> Sulfolobales were summarized <strong>in</strong> Pie plots (Fig. 9).<br />
The overall ratios of spacer matches to viruses/plasmids,<br />
for families I, II and III, are 3.5, 2.0 and 3.5 respectively,<br />
suggest<strong>in</strong>g that <strong>the</strong> family II <strong>CRISPR</strong>s have a relative bias<br />
to plasmids. Although no absolute biases are apparent<br />
(Fig. 9), rudiviral matches dom<strong>in</strong>ate for family III and conjugative<br />
plasmid matches are enhanced for family II<br />
<strong>CRISPR</strong>s. The rudiviruses, lipothrixviruses and conjugative<br />
plasmids, which predom<strong>in</strong>ate <strong>in</strong> <strong>the</strong> Pie plot, are all<br />
abundant environmentally (Greve et al., 2004; Bize et al.,<br />
2008; Vestergaard et al., 2008).<br />
Discussion<br />
Biogenesis of small archaeal RNAs appears to proceed<br />
from a full-length s<strong>in</strong>gle-stranded primary transcript that is<br />
cleaved by endoribonucleases as was recently reported<br />
for <strong>the</strong> Cas6 prote<strong>in</strong> <strong>in</strong> P. furiosus (Carte et al., 2008). This<br />
suggests that <strong>the</strong> mechanism of cleavage <strong>in</strong> archaea<br />
is dist<strong>in</strong>ct from <strong>the</strong> Dicer endoribonuclease-dependent<br />
mechanism generat<strong>in</strong>g si- and miRNAs <strong>in</strong> eukarya.<br />
However, eukarya also generate small RNAs by Dicer<strong>in</strong>dependent<br />
mechanisms such as seen for piRNA-like<br />
species, and although <strong>the</strong> mechanism of biogenesis of <strong>the</strong><br />
latter <strong>in</strong> terms of trans-act<strong>in</strong>g factors is unresolved, certa<strong>in</strong><br />
aspects are rem<strong>in</strong>iscent of <strong>the</strong> process observed <strong>in</strong> this<br />
study. In particular, <strong>the</strong> presence of an <strong>in</strong>dependently processed<br />
complementary RNA strand has been reported<br />
(reviewed <strong>in</strong> Klattenhoff and Theurkauf, 2008). As <strong>the</strong>re is<br />
no evidence for an RNA-dependent RNA polymerase <strong>in</strong><br />
Sulfolobus, transcription of <strong>the</strong> complementary strand is<br />
likely to be dictated by <strong>the</strong> putative promoter elements<br />
located immediately downstream from <strong>the</strong> <strong>CRISPR</strong> loci.<br />
Inspection of downstream elements of all <strong>CRISPR</strong> clusters<br />
<strong>in</strong> S. acidocaldarius reveals BRE/TATA promoter<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
egions, that are likely to <strong>in</strong>itiate full-length complementary<br />
strand RNA products, as shown for <strong>the</strong> Saci-5 cluster<br />
(Fig. 6). Fur<strong>the</strong>r process<strong>in</strong>g of <strong>the</strong> complementary transcripts<br />
are likely to proceed by an endoribonuclease dist<strong>in</strong>ct<br />
from that generat<strong>in</strong>g spacer RNAs from <strong>the</strong> leader<br />
strand transcript, because <strong>the</strong> ‘handles’ <strong>in</strong> <strong>the</strong> repeats<br />
must be different given <strong>the</strong>ir different RNA sizes (about<br />
55 nt versus 40–45 nt). What is <strong>the</strong> functional significance<br />
of <strong>the</strong> complementary small RNAs? One possibility is that<br />
<strong>the</strong>y neutralize <strong>the</strong> leader spacer RNAs <strong>in</strong> <strong>the</strong> absence of<br />
<strong>in</strong>vad<strong>in</strong>g extra-chromosomal elements, although we failed<br />
to detect dsRNAs <strong>in</strong> <strong>the</strong> expected size range, but ano<strong>the</strong>r<br />
possibility is that load<strong>in</strong>g of leader-spacer RNAs onto<br />
an Argonaute-conta<strong>in</strong><strong>in</strong>g complex has to proceed via a<br />
dsRNA <strong>in</strong>termediate, as observed for <strong>the</strong> si- and miRNA<br />
pathways. The presence of Argonautes <strong>in</strong> archaea may<br />
facilitate a dist<strong>in</strong>ct mode of guide RNA presentation from<br />
that seen <strong>in</strong> bacteria, where <strong>the</strong>re is no evidence of <strong>the</strong><br />
participation of a complementary RNA strand <strong>in</strong> <strong>CRISPR</strong><br />
function (Brouns et al., 2008; Marraff<strong>in</strong>i and Son<strong>the</strong>imer,<br />
2008).<br />
Cellular activity of <strong>CRISPR</strong>s<br />
The observation that more than one <strong>CRISPR</strong> family is<br />
generally present <strong>in</strong> one organism suggested that <strong>the</strong>y<br />
may provide added versatility <strong>in</strong> regulat<strong>in</strong>g or <strong>in</strong>hibit<strong>in</strong>g<br />
<strong>in</strong>vad<strong>in</strong>g viruses or plasmids, and this supposition<br />
received some support from <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that putative recognition<br />
signals upstream from predicted proto-spacer<br />
sequences on viruses/plasmids are different for different<br />
<strong>CRISPR</strong> families (Fig. 1C). Analysis of <strong>the</strong> repeat-clusters<br />
of <strong>the</strong> two <strong>CRISPR</strong> families of S. solfataricus stra<strong>in</strong>s P1<br />
and P2 revealed biases to bicaudaviruses for family I, and<br />
to rudiviruses for family II, <strong>CRISPR</strong>s (Fig. 8). A study of<br />
3039 spacers from <strong>the</strong> three ma<strong>in</strong> families of all <strong>the</strong> Sulfolobales<br />
also showed significant biases, <strong>in</strong> particular a<br />
preference of family III spacers for rudiviruses (Fig. 9).<br />
This supports that <strong>the</strong> presence of multiple <strong>CRISPR</strong> families<br />
may produce a more versatile response to <strong>in</strong>vad<strong>in</strong>g<br />
genetic elements.<br />
The results also show that <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong> of<br />
S. acidocaldarius is primed to react rapidly to <strong>in</strong>vasion <strong>in</strong><br />
that <strong>the</strong> large cluster transcripts are present despite <strong>the</strong><br />
absence of viruses and plasmids. The <strong>system</strong> only<br />
requires that <strong>the</strong> RNA process<strong>in</strong>g enzymes are rapidly<br />
activated. The observation that process<strong>in</strong>g of <strong>the</strong> leader<br />
transcript strongly <strong>in</strong>creases <strong>in</strong> <strong>the</strong> stationary phase<br />
(Fig. 3) is also consistent with <strong>the</strong>se cells be<strong>in</strong>g more<br />
susceptible to external attack.<br />
Generation of spacer RNAs<br />
Transcripts on <strong>the</strong> leader strand <strong>in</strong>itiate just upstream<br />
from <strong>the</strong> first repeat, <strong>in</strong>dependently of <strong>the</strong> presence of a<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 269<br />
leader region. Process<strong>in</strong>g occurs primarily from <strong>the</strong> 3′-end<br />
of a s<strong>in</strong>gle transcript of <strong>the</strong> whole repeat-cluster, although<br />
limited process<strong>in</strong>g also occurs at <strong>the</strong> 5′-end (Fig. 4A), and<br />
repeats are targeted to generate a series of fragments.<br />
The small spacer RNAs from exponentially grow<strong>in</strong>g and<br />
stationary phase cells, rang<strong>in</strong>g <strong>in</strong> size from 40 to 52 nt<br />
and 35 to 52 nt respectively, represent a spectrum of<br />
fragments which anneal with spacer-specific, but not<br />
repeat-specific probes (Fig. 5A), consistent with earlier<br />
observations for archaea and bacteria (Lillestøl et al.,<br />
2006; Brouns et al., 2008; Hale et al., 2008). Process<strong>in</strong>g<br />
activity <strong>in</strong>itiates ma<strong>in</strong>ly at stationary phase, at least for<br />
cells lack<strong>in</strong>g extra-chromosomal elements (Fig. 3).<br />
Recently, it was shown that <strong>the</strong> Cas6 endoribonuclease<br />
b<strong>in</strong>ds to <strong>the</strong> 5′-end of a P. furiosus repeat, which can<br />
generate a hairp<strong>in</strong> structure, and cuts near <strong>the</strong> 3′-end<br />
(Carte et al., 2008). This result could expla<strong>in</strong> <strong>the</strong> anomalous<br />
process<strong>in</strong>g of <strong>the</strong> pKEF-7 transcript (Fig. 7B; Table 2)<br />
which exhibits an unusual 3′-term<strong>in</strong>al repeat sequence<br />
(Fig. 2B). Never<strong>the</strong>less, given <strong>the</strong> wide sequence and<br />
secondary structural diversity of repeat RNAs (Peng<br />
et al., 2003; Kun<strong>in</strong> et al., 2007), <strong>the</strong> enzymes must exhibit<br />
a wide range of recognition mechanisms.<br />
Transcripts of <strong>the</strong> complementary strand were <strong>in</strong>variably<br />
produced from each repeat-cluster and <strong>the</strong>y ranged<br />
<strong>in</strong> size from larger fragments to spacer RNAs of about<br />
55 nt, about 16 nt larger than <strong>the</strong> probed spacer, and<br />
consistent with <strong>the</strong> earlier observation for Saci-5 (Lillestøl<br />
et al., 2006). Although no reproducible RNA expression<br />
was observed from <strong>the</strong> complementary spacer strand for<br />
<strong>the</strong> euryarchaeon P. furiosus (Hale et al., 2008), this could<br />
have a technical explanation. For <strong>the</strong> cDNA libraries only<br />
fragments < 50 nt were screened for, and <strong>in</strong> <strong>the</strong> Nor<strong>the</strong>rn<br />
blot analysis, <strong>the</strong> 12% polyacrylamide gels used would<br />
not have resolved <strong>the</strong> large transcripts observed for<br />
Sulfolobus (Fig. 5B).<br />
Regular and irregular development of repeat-clusters<br />
The pairs of repeat-clusters E and F from S. solfataricus<br />
P1 and P2 are both identical and have not undergone<br />
structural changes s<strong>in</strong>ce <strong>the</strong> stra<strong>in</strong>s diverged (Fig. 8).<br />
Cluster E (Ssol-8) carries a family I leader but a degenerate<br />
first repeat, which may <strong>in</strong>hibit cognate enzyme recognition<br />
of <strong>the</strong> repeat and, <strong>the</strong>reby, subsequent extension<br />
of <strong>the</strong> repeat-cluster. In contrast, cluster F (Ssol-91) lacks<br />
a leader sequence which could provide an assembly site<br />
for DNA enzymes <strong>in</strong>volved <strong>in</strong> cluster extension. In addition,<br />
clusters E and F lack physically l<strong>in</strong>ked cas genes<br />
which could be important for DNA <strong>in</strong>sertion functions<br />
(Cas1) or RNA process<strong>in</strong>g (Cas2 and Cas4) (Makarova<br />
et al., 2006; Beloglazova et al., 2008).<br />
Irregularities <strong>in</strong> archaeal repeat-clusters are extremely<br />
rare (Lillestøl et al., 2006). However, <strong>in</strong> cluster F of both
270 R. K. Lillestøl et al. <br />
stra<strong>in</strong>s, a 106 bp region conta<strong>in</strong><strong>in</strong>g a half spacer preceded<br />
by two atypical repeat sequences is followed by a regular<br />
repeat sequence and no spacer (Fig. 8). These structures<br />
ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> precise size of <strong>the</strong> spacer-repeat units <strong>in</strong> <strong>the</strong><br />
cluster suggest<strong>in</strong>g that some k<strong>in</strong>d of ruler mechanism<br />
regulates <strong>the</strong> <strong>in</strong>sertion of new spacer-repeat units.<br />
Ano<strong>the</strong>r exceptional irregularity occurs <strong>in</strong> cluster D of<br />
stra<strong>in</strong> P1, where an 899 bp fragment carry<strong>in</strong>g a pNOB8like<br />
conjugative plasmid sequence (She et al., 1998) is<br />
flanked by repeats. Both examples may reflect a mechanistic<br />
defect whereby large plasmid regions, <strong>the</strong> former<br />
carry<strong>in</strong>g repeats, have been <strong>in</strong>correctly excised and <strong>in</strong>corporated<br />
<strong>in</strong>to <strong>the</strong> repeat-cluster. Fur<strong>the</strong>r exam<strong>in</strong>ation of this<br />
region may yield some <strong>in</strong>sight <strong>in</strong>to how DNA is obta<strong>in</strong>ed<br />
from extra-chromosomal elements.<br />
Mechanism of <strong>CRISPR</strong> transfer<br />
The commonality of <strong>CRISPR</strong> families <strong>in</strong> different Sulfolobus<br />
stra<strong>in</strong>s suggests that <strong>the</strong>y can be transferred horizontally<br />
(Lillestøl et al., 2006; Horvath et al., 2008b) but <strong>the</strong><br />
mechanism by which this could occur is unclear. For some<br />
bacteria it was proposed that large plasmids could carry<br />
and transmit <strong>the</strong> <strong>CRISPR</strong> apparatus (Godde and Bickerton,<br />
2006) but known crenarchaeal cryptic plasmids are<br />
quite small (5–10 kb) and <strong>the</strong> largest conjugative plasmids<br />
are only 40–50 kb (Greve et al., 2004), <strong>in</strong>sufficiently<br />
large to carry complex <strong>CRISPR</strong> <strong>system</strong>s. One possibility<br />
is that <strong>the</strong> <strong>system</strong> is transferred by chromosomal conjugation.<br />
Both S. acidocaldarius and Sulfolobus tokodaii<br />
chromosomes carry encaptured conjugative plasmids<br />
where <strong>the</strong> genes implicated <strong>in</strong> <strong>the</strong> conjugative process are<br />
ma<strong>in</strong>ta<strong>in</strong>ed (Greve et al., 2004) and for S. acidocaldarius,<br />
at least, conjugative transfer of chromosomal DNA has<br />
been demonstrated experimentally (Aagaard et al., 1995;<br />
Grogan, 1996).<br />
Experimental procedures<br />
Growth of Sulfolobus cells and preparation of DNA<br />
Sulfolobus acidocaldarius cells were grown at 78°C <strong>in</strong><br />
complex medium conta<strong>in</strong><strong>in</strong>g 2% tryptone (Schleper et al.,<br />
1995) and harvested at exponential or stationary phase by<br />
centrifug<strong>in</strong>g at 4000 r.p.m. and 4°C for 15 m<strong>in</strong>. Cells of<br />
S. solfataricus stra<strong>in</strong>s P1 and P2 were grown at 80°C <strong>in</strong><br />
complex medium conta<strong>in</strong><strong>in</strong>g 2% tryptone (Schleper et al.,<br />
1995). Total DNA used for repeat-cluster sequenc<strong>in</strong>g was<br />
isolated from S. solfataricus stra<strong>in</strong> P1 us<strong>in</strong>g DNeasy Kit<br />
(Qiagen, Westberg, Germany). Conjugation was <strong>in</strong>itiated by<br />
mix<strong>in</strong>g a culture of S. islandicus stra<strong>in</strong> Hi165 which harbours<br />
<strong>the</strong> conjugative plasmid pKEF9, with S. solfataricus P2 cells<br />
at a ratio of 1:10 000 at A600 = 0.17 (Schleper et al., 1995).<br />
Cells were harvested at 14 h after conjugation and centrifuged<br />
at 4000 r.p.m. for 6 m<strong>in</strong> at 4°C. pKEF9 was isolated<br />
us<strong>in</strong>g <strong>the</strong> Plasmid M<strong>in</strong>i Kit (Qiagen) and digested with EcoRI<br />
to verify its presence (Greve et al., 2004).<br />
RNA preparation, RNase digestion and Nor<strong>the</strong>rn blott<strong>in</strong>g<br />
Total RNA from S. acidocaldarius cells, and S. solfataricus<br />
cells conjugated with pKEF9, was prepared us<strong>in</strong>g Trizol<br />
(Invitrogen, Paisley, UK) accord<strong>in</strong>g to <strong>the</strong> Invitrogen protocol<br />
essentially as used for extract<strong>in</strong>g plant si-RNAs (Sunkar<br />
et al., 2005) and treated with DNase I (Applied Bio<strong>system</strong>s/<br />
Ambion, Aust<strong>in</strong>, TX) accord<strong>in</strong>g to <strong>the</strong> protocol from Ambion,<br />
and essentially as used for extract<strong>in</strong>g plant si-RNAs<br />
(Sunkar et al., 2005). To detect dsRNA, 20 mg of RNA was<br />
treated with various concentrations of RNase T1 (Ambion) <strong>in</strong><br />
RNase-digestion III buffer (Ambion), and RNase U2<br />
(Sankyo, Japan) <strong>in</strong> digestion buffer 20 mM Na acetate<br />
(pH 4.6), 2 mM MgCl2, 100 mM KCl, at 37°C for 30 m<strong>in</strong>.<br />
RNase was <strong>in</strong>activated and <strong>the</strong> RNA was precipitated<br />
with 225 ml of RNase <strong>in</strong>activation/precipitation solution III<br />
(Ambion) toge<strong>the</strong>r with 150 ml ethanol at -20°C for 1 h or<br />
overnight. For Nor<strong>the</strong>rn blott<strong>in</strong>g of small RNAs, 20 mg RNA<br />
was mixed with 10 ml Gel Load<strong>in</strong>g Buffer II (Applied<br />
Bio<strong>system</strong>s/Ambion) and fractionated <strong>in</strong> a 6–10% polyacrylamide<br />
gel conta<strong>in</strong><strong>in</strong>g 7 M urea, 90 mM Tris, 90 mM boric<br />
acid, 2 mM EDTA, pH 8.3, toge<strong>the</strong>r with a 10–150 nt ladder<br />
(Decade Marker System, Ambion, Huntigdon, UK) or a<br />
0.1–2.0 kb RNA ladder (Invitrogen). RNA was transferred<br />
onto Hybond N + nylon membranes (GE Healthcare, Amersham,<br />
UK) or GeneScreen plus nylon membranes (Perk<strong>in</strong>Elmer<br />
Life Sciences, Boston, USA) us<strong>in</strong>g <strong>the</strong> Bio-Rad<br />
semidry blott<strong>in</strong>g apparatus (Bio-Rad, Hercules, CA) and<br />
0.5¥ TBE (45 mM Tris, 45 mM boric acid, 1 mM EDTA,<br />
pH 8.3) as <strong>the</strong> blott<strong>in</strong>g buffer. For Nor<strong>the</strong>rn blott<strong>in</strong>g with<br />
large RNAs, 12 mg RNA was mixed with Nor<strong>the</strong>rn Max-Gly<br />
Sample Load<strong>in</strong>g Dye (Applied Bio<strong>system</strong>s/Ambion) and<br />
fractionated <strong>in</strong> a 1.5% agarose-BPTE (10 mM PIPES,<br />
pH 6.5, 30 mM Bis-Tris, 1 mM EDTA) gel, toge<strong>the</strong>r with a<br />
0.5–9 kb Millenium Marker (Applied Bio<strong>system</strong>s/Ambion).<br />
The RNA was transferred onto Hybond N + nylon membranes<br />
(GE Healthcare) by capillary blott<strong>in</strong>g with 0.2 M<br />
NaH2PO4, pH 7.4, 3.0 M NaCl, 0.02 M EDTA. After immobiliz<strong>in</strong>g<br />
<strong>the</strong> RNAs us<strong>in</strong>g a UV Crossl<strong>in</strong>ker (Stratagene, La<br />
Jolla, USA), <strong>the</strong> nylon membranes were pre-hybridized for<br />
1 h <strong>in</strong> 6¥ SSPE buffer (0.9 M NaCl, 60 mM NaH2PO4,<br />
4.6 mM EDTA, pH 7.4), 0.5% SDS and 5¥ Denhardt’s solution<br />
at 5°C lower than <strong>the</strong> Tm of <strong>the</strong> probe (TH). Oligonucleotides<br />
24–26-mers complementary to a spacer, or <strong>the</strong><br />
repeat, on ei<strong>the</strong>r strand, were end-labelled with [g 32 P]-ATP<br />
and T4 polynucleotide k<strong>in</strong>ase. Hybridization was performed<br />
at <strong>the</strong> TH of <strong>the</strong> probe <strong>in</strong> 6¥ SSPE, 0.5% SDS, 3¥ Denhardt’s<br />
solution for 18 h. The samples were washed three<br />
times at room temperature with 6¥ SSPE buffer and 0.1%<br />
SDS for 15 m<strong>in</strong> each and, subsequently, at <strong>the</strong> TH <strong>in</strong> <strong>the</strong><br />
same buffer. Membranes were exposed to Ultra UV-G X-ray<br />
film (Dupharma, Kastrup, Denmark) for 1 h to 3 days.<br />
Determ<strong>in</strong>ation of transcript ends<br />
The RLM-RACE kit (Applied Bio<strong>system</strong>s/Ambion) was used<br />
to determ<strong>in</strong>e <strong>the</strong> ends of transcripts generated from repeatclusters<br />
<strong>in</strong> S. acidocaldarius and pKEF9, with some modifications<br />
<strong>in</strong> <strong>the</strong> kit-protocol. To identify 5′-ends, 5 mg RNA was<br />
treated with tobacco acid pyrophosphatase (TAP) accord<strong>in</strong>g<br />
to <strong>the</strong> protocol. Both TAP-treated and untreated RNA were<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
<strong>the</strong>n l<strong>in</strong>ked to a 5′-RLM RACE adapter with RNA ligase,<br />
followed by reverse transcription from a spacer-specific<br />
primer accord<strong>in</strong>g to <strong>the</strong> protocol. Products were <strong>the</strong>n amplified<br />
by PCR with a 5′-RLM RACE adapter-specific primer<br />
conta<strong>in</strong><strong>in</strong>g a BamHI restriction site at <strong>the</strong> 5′ end and a spacerspecific<br />
primer carry<strong>in</strong>g an EcoRI restriction site, <strong>in</strong> order to<br />
facilitate clon<strong>in</strong>g of <strong>the</strong> PCR products <strong>in</strong>to pUC18. The PCRproducts<br />
were run on a 2% low melt<strong>in</strong>g agarose gel and<br />
purified with QIAquick Gel Extraction Kit (Qiagen). The fragments<br />
were cloned <strong>in</strong>to BamHI and EcoRI-digested pUC18 at<br />
a molar ratio of 4:1 and sequenced.<br />
Sequenc<strong>in</strong>g of clusters <strong>in</strong> S. solfataricus P1<br />
Long range PCR products were obta<strong>in</strong>ed across <strong>the</strong> chromosomal<br />
cluster regions of S. solfataricus stra<strong>in</strong> P1 us<strong>in</strong>g <strong>the</strong><br />
Herculase II kit (Stratagene, La Jolla, CA) accord<strong>in</strong>g to <strong>the</strong><br />
protocol, with 300 ng genomic DNA <strong>in</strong> 50 ml reactions. DNA<br />
fragments were purified us<strong>in</strong>g Qiaquick PCR purification kit<br />
(Qiagen) and sequenced. Sequences were analysed with<br />
Sequencher (Gene Codes, Ann Arbor, MI). BLAST searches<br />
were performed aga<strong>in</strong>st <strong>the</strong> Sulfolobus Database (http://<br />
sulfolobus.org).<br />
Bio<strong>in</strong>formatical analysis of <strong>CRISPR</strong>s of <strong>the</strong> Sulfolobales<br />
Repeat-clusters were identified us<strong>in</strong>g publicly available<br />
software (Edgar, 2007; Bland et al., 2007) <strong>in</strong> all available<br />
Sulfolobales genomes (S. solfataricus P2, S. tokodaii 7,<br />
S. acidocaldarius DSM 639, Metallosphaera sedula<br />
DSM5348 from GenBank (http://www.ncbi.nlm.nih.gov/<br />
Genbank/), Sulfolobus islandicus stra<strong>in</strong>s LD85, YG5714,<br />
YN1551, M164 and U328 from JGI (http://genome.jgi.doe.<br />
gov/mic_asmb.html), and S. islandicus stra<strong>in</strong>s HVE10/4 and<br />
REY15A and Acidianus brierleyi (unpublished data). Repeatcluster<br />
names identify <strong>the</strong> species and number of repeats.<br />
Repeat-cluster orientations were determ<strong>in</strong>ed by locat<strong>in</strong>g <strong>the</strong><br />
upstream leader sequence and/or by exam<strong>in</strong><strong>in</strong>g <strong>the</strong> repeat<br />
sequence. Leader sequences, when present, were limited to<br />
300 bp for <strong>the</strong> multiple alignment analyses (Edgar, 2004) and<br />
motif analyses (Bailey et al., 2006). Representative repeat<br />
sequences from each identified repeat-cluster were aligned<br />
(Edgar, 2004) and a phylogenetic tree was generated<br />
(Higg<strong>in</strong>s et al., 1994). Spacer sequences from each repeatcluster<br />
were aligned (Sæbø et al., 2005) aga<strong>in</strong>st <strong>the</strong><br />
genomes of extra-chromosomal elements of <strong>the</strong> Sulfolobales<br />
(http://sulfolobus.org/; Brügger, 2007) at a nucleotide level<br />
(Shah et al., 2009). Additionally, spacers were aligned<br />
aga<strong>in</strong>st am<strong>in</strong>o acid sequences of annotated ORFs of <strong>the</strong><br />
extra-chromosomal elements, at an am<strong>in</strong>o acid level (Shah<br />
et al., 2009; Vestergaard et al., 2008). Significance cut-offs<br />
were determ<strong>in</strong>ed for both alignment types by us<strong>in</strong>g <strong>the</strong><br />
genome sequence of Saccharomyces cerevisiae as a negative<br />
control.<br />
Acknowledgements<br />
The work was supported by grants from <strong>the</strong> Danish Natural<br />
Science Research Council and <strong>the</strong> Danish National<br />
Research Foundation.<br />
References<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>s of Sulfolobus 271<br />
Aagaard, C., Dalgaard, J., and Garrett, R.A. (1995) Intercellular<br />
mobility and hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron<br />
confers selective advantage over <strong>in</strong>tron- cells of Sulfolobus<br />
acidocaldarius. Proc Natl Acad Sci USA 92: 12285–12289.<br />
Bailey, T.L., Williams, N., Misleh, C., and Li, W.W. (2006)<br />
MEME: discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong><br />
sequence motifs. Nucleic Acids Res 34: 369–373.<br />
Barrangou, R., Fremaux, C., Deveau, H., Richards, M.,<br />
Boyaval, P., Mo<strong>in</strong>eau, S., et al. (2007) <strong>CRISPR</strong> provides<br />
acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes.<br />
Science 315: 1709–1712.<br />
Beloglazova, N., Brown, G., Zimmerman, M.D., Proudfoot,<br />
M., Makarova, K.S., Kudritska, M., et al. (2008) A novel<br />
family of sequence-specific endoribonucleases associated<br />
with <strong>the</strong> Clustered Regularly Interspaced Short Pal<strong>in</strong>dromic<br />
Repeats. J Biol Chem 29: 20361–20371.<br />
Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />
Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />
of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159:<br />
358–366.<br />
Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K.,<br />
Kyrpides, N.C., and Hugenholtz, P. (2007) <strong>CRISPR</strong><br />
Recognition Tool (CRT): a tool for automatic detection of<br />
clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />
Bio<strong>in</strong>formatics 8: 209.<br />
Bolot<strong>in</strong>, A., Qu<strong>in</strong>quis, B., Sorok<strong>in</strong>, A., and Ehrlich, S.D. (2005)<br />
Clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome repeats<br />
(<strong>CRISPR</strong>s) have spacers of extrachromosomal orig<strong>in</strong>.<br />
Microbiology 151: 2551–2561.<br />
Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R.,<br />
Slijkhuis, R.J., Snijders, A.P., et al. (2008) Small <strong>CRISPR</strong><br />
RNAs guide antiviral defense <strong>in</strong> prokaryotes. Science 321:<br />
960–964.<br />
Brügger, K. (2007) The Sulfolobus database. Nucleic Acids<br />
Res 35: D413–D415.<br />
Carte, J., Wang, R., Li, H., Terns, R.M., and Terns, M.P.<br />
(2008) Cas6 is an endoribonuclease that generates guide<br />
RNAs for <strong>in</strong>vader defense <strong>in</strong> prokaryotes. Genes Dev 22:<br />
3489–3496.<br />
Chen, L., Brügger, M., Skovgaard, M., Redder, P., She, Q.,<br />
Torar<strong>in</strong>sson, E., et al. (2005) The genome of Sulfolobus<br />
acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota.<br />
J Bacteriol 187: 4992–4999.<br />
Christiansen, J., Egebjerg, J., Larsen, N., and Garrett, R.A.<br />
(1990) Analysis of rRNA structure: experimental and <strong>the</strong>oretical<br />
considerations. In Ribosomes and Prote<strong>in</strong><br />
Syn<strong>the</strong>sis. Spedd<strong>in</strong>g, G. (ed.). Oxford: Oxford University<br />
Press, pp. 229–252.<br />
Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J.,<br />
Fremaux, C., Boyaval, P., et al. (2008) Phage response to<br />
<strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />
J Bacteriol 190: 1390–1400.<br />
Edgar, R.C. (2004) MUSCLE: multiple sequence alignment<br />
with high accuracy and high throughput. Nucleic Acids Res<br />
32: 1792–1797.<br />
Edgar, R.C. (2007) PILER-CR: fast and accurate identification<br />
of <strong>CRISPR</strong> repeats. BMC Bio<strong>in</strong>formatics 8: 18.<br />
Forterre, P. (1992) Neutral terms. Nature 355: 305.<br />
Godde, J.S., and Bickerton, A. (2006) The repetitive DNA<br />
elements called <strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evi-
272 R. K. Lillestøl et al. <br />
dence of horizontal transfer among prokaryotes. J Mol Evol<br />
62: 718–729.<br />
Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />
R.A. (2004) Genomic comparison of archaeal conjugative<br />
plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />
Grissa, I., Vergnaud, G., and Pourcel, C. (2007) The CRISP-<br />
Rdb database and tools to display <strong>CRISPR</strong>s and to generate<br />
dictionaries of spacers and repeats. Bio<strong>in</strong>formatics 8:<br />
172.<br />
Grogan, D.W. (1996) Exchange of genetic markers at<br />
extremely high temperatures <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />
acidocaldarius. J Bacteriol 178: 3207–3211.<br />
Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F., and Nelson, K.E.<br />
(2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong><br />
families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong><br />
prokaryotic genomes. PLoS Comput Biol 1: 474–483.<br />
Hale, C., Kleppe, K., Terns, R.M., and Terns, M.P. (2008)<br />
Prokaryotic silenc<strong>in</strong>g (psi) RNAs <strong>in</strong> Pyrococcus furiosus.<br />
RNA 14: 1–8.<br />
Higg<strong>in</strong>s, D., Thompson, J., Gibson, T., Thompson, J.D.,<br />
Higg<strong>in</strong>s, D.G., and Gibson, T.J. (1994) CLUSTAL W:<br />
improv<strong>in</strong>g <strong>the</strong> sensitivity of progressive multiple sequence<br />
alignment through sequence weight<strong>in</strong>g, position-specific<br />
gap penalties and weight matrix choice. Nucleic Acids Res<br />
22: 4673–4680.<br />
Horvath, P., Romero, D.A., Coûté-Monvois<strong>in</strong>, A.C., Richards,<br />
M., Deveau, H., Mo<strong>in</strong>eau, S., et al. (2008a) Diversity, activity,<br />
and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus<br />
<strong>the</strong>rmophilus. J Bacteriol 190: 1401–1412.<br />
Horvath, P., Coûté-Monvois<strong>in</strong>, A.C., Romero, D.A., Boyaval,<br />
P., Fremaux, C., and Barrangou, R. (2008b) Comparative<br />
analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid bacteria genomes. Int<br />
J Food Microbiol doi:10.1016/j.ijfoodmicro.2008.05.030<br />
Jansen, R., Embden, J.D., Gaastra, W., and Schouls, L.M.<br />
(2002) Identification of genes that are associated with DNA<br />
repeats <strong>in</strong> prokaryotes. Mol Microbiol 43: 1565–1575.<br />
Klattenhoff, C., and Theurkauf, W. (2008) Biogenesis and<br />
germl<strong>in</strong>e functions of piRNAs. Development 135: 3–9.<br />
Kun<strong>in</strong>, V., Sorek, R., and Hugenholtz, P. (2007) Evolutionary<br />
conservation of sequence and secondary structures <strong>in</strong><br />
<strong>CRISPR</strong> repeats. Genome Biol 8: R61.<br />
Lillestøl, R.K., Redder, P., Garrett, R.A., and Brügger, K.<br />
(2006) A putative viral defence mechanism <strong>in</strong> archaeal<br />
cells. <strong>Archaea</strong> 2: 59–72.<br />
Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I.,<br />
and Koon<strong>in</strong>, E.V. (2006) A putative RNA-<strong>in</strong>terferencebased<br />
<strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes: computational<br />
analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />
analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms<br />
of action. Biol Direct 1: 7.<br />
Marraff<strong>in</strong>i, L.A., and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />
limits horizontal gene transfer <strong>in</strong> Staphylococci by<br />
target<strong>in</strong>g DNA. Science 322: 1843–1845.<br />
Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., and<br />
Soria, E. (2005) Interven<strong>in</strong>g sequences of regularly spaced<br />
prokaryotic repeats derive from foreign genetic elements.<br />
J Mol Evol 60: 174–182.<br />
Peng, X., Brügger, K., Shen, B., Chen, L., She, Q., and<br />
Garrett, R.A. (2003) Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong><br />
large clusters of DNA repeats (Short Regularly Spaced<br />
Repeats) present <strong>in</strong> Sulfolobus genomes. J Bacteriol 185:<br />
2410–2417.<br />
Pourcel, C., Salvignol, G., and Vergnaud, G. (2005) <strong>CRISPR</strong><br />
elements <strong>in</strong> Yers<strong>in</strong>ia pestis acquire new repeats by preferential<br />
uptake of bacteriophage DNA, and provide additional<br />
tools for evolutionary studies. Microbiology 151: 653–663.<br />
Prangishvili, D., Forterre, P., and Garrett, R.A. (2006) Viruses<br />
of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev Microbiol 11:<br />
837–848.<br />
Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K.,<br />
and Rognes, T. (2005) PARALIGN: rapid and sensitive<br />
sequence similarity searches powered by parallel comput<strong>in</strong>g<br />
technology. Nucleic Acids Res 33: 535–539.<br />
Schleper, C., Holz, I., Janekovic, D., Murphy, J., and Zillig, W.<br />
(1995) A Multicopy plasmid of <strong>the</strong> extremely <strong>the</strong>rmophilic<br />
archaeon Sulfolobus effects its transfer to recipients by<br />
mat<strong>in</strong>g. J Bacteriol 177: 4417–4426.<br />
Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />
of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />
of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />
<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem Soc Trans 37: 23–28.<br />
She, Q., Phan, H., Garrett, R.A., Albers, S.-V., Stedman,<br />
K.M., and Zillig, W. (1998) Genetic profile of pNOB8 from<br />
Sulfolobus: <strong>the</strong> first conjugative plasmid from an archaeon.<br />
Extremophiles 2: 417–425.<br />
She, Q., S<strong>in</strong>gh, R.K., Confalonieri, F., Zivanovic, Y., Gordon,<br />
P., Allard, G., et al. (2001) The complete genome of <strong>the</strong><br />
crenarchaeon Sulfolobus solfataricus P2. Proc Natl Acad<br />
Sci USA 98: 7835–7840.<br />
Sorek, R., Kun<strong>in</strong>, V., and Hugenholtz, P. <strong>CRISPR</strong> – a<br />
widespread <strong>system</strong> that provides acquired resistance<br />
aga<strong>in</strong>st phages <strong>in</strong> bacteria and archaea. (2008) Nat Rev<br />
Microbiol 6: 181–186.<br />
Sunkar, R., Girke, T., and Zhu, J.K. (2005) Identification and<br />
characterization of endogenous small <strong>in</strong>terfer<strong>in</strong>g RNAs<br />
from rice. Nucleic Acids Res 33: 4443–4454.<br />
Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>,<br />
M.-L., Huber, H., Drungowski, M., et al. (2002) Identification<br />
of 86 candidates for small non-messenger RNAs from<br />
<strong>the</strong> archaeon Archaeoglobus fulgidus. Proc Natl Acad Sci<br />
USA 99: 7536–7541.<br />
Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K.,<br />
Garrett, R.A., et al. (2005) Identification of novel noncod<strong>in</strong>g<br />
RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong><br />
archaeon Sulfolobus solfataricus. Mol Microbiol 55: 469–<br />
481.<br />
Torar<strong>in</strong>sson, E., Klenk, H.P., and Garrett, R.A. (2005) Divergent<br />
transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>.<br />
Environ Microbiol 7: 47–54.<br />
Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />
M., Phan, H., et al. (2008) SRV, a new rudiviral isolate from<br />
Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />
with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J Bacteriol<br />
190: 6837–6845.<br />
©2009TheAuthors<br />
Journal compilation © 2009 Blackwell Publish<strong>in</strong>g Ltd, Molecular Microbiology, 72, 259–272
Environmental Microbiology (2009) doi:10.1111/j.1462-2920.2009.02009.x<br />
Four newly isolated fuselloviruses from extreme<br />
geo<strong>the</strong>rmal environments reveal unusual morphologies<br />
and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation mechanismemi_2009 1..14<br />
Peter Redder, 1 * Xu Peng, 2 Kim Brügger, 2<br />
Shiraz A. Shah, 2 Ferd<strong>in</strong>and Roesch, 1 Bo Greve, 2<br />
Qunx<strong>in</strong> She, 2 Christa Schleper, 3 Patrick Forterre, 1<br />
Roger A. Garrett 2 and David Prangishvili 1<br />
1 Unite de Biologie Moleculaire du Gene chez les<br />
Extremophiles, Institut Pasteur, 25, rue du Dr Roux,<br />
F-75015 Paris, France.<br />
2 Danish <strong>Archaea</strong> Centre, Department of Biology,<br />
Biocenter, Ole Maaløesvej 5, Copenhagen University,<br />
DK-2200 Copenhagen N, Denmark.<br />
3 Department of Genetics <strong>in</strong> Ecology, University of<br />
Vienna, Althanstrasse 14, A-1090 Vienna, Austria.<br />
Summary<br />
Sp<strong>in</strong>dle-shaped virus-like particles are abundant <strong>in</strong><br />
extreme geo<strong>the</strong>rmal environments, from which five<br />
sp<strong>in</strong>dle-shaped viral species have been isolated to<br />
date. They <strong>in</strong>fect members of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
archaeal genus Sulfolobus, and constitute <strong>the</strong> Fuselloviridae,<br />
a family of double-stranded DNA viruses.<br />
Here we present four new members of this family, all<br />
from terrestrial acidic hot spr<strong>in</strong>gs. Two of <strong>the</strong> new<br />
viruses exhibit a novel morphotype for <strong>the</strong>ir proposed<br />
attachment structures, and specific features of <strong>the</strong>ir<br />
genome sequences strongly suggest <strong>the</strong> identity of<br />
<strong>the</strong> host-attachment prote<strong>in</strong>. All fuselloviral genomes<br />
are highly conserved at <strong>the</strong> nucleotide level, although<br />
<strong>the</strong> regions of conservation differ between viruspairs,<br />
consistent with a high frequency of homologous<br />
recomb<strong>in</strong>ation hav<strong>in</strong>g occurred between <strong>the</strong>m.<br />
We propose a fuselloviral specific mechanism for<br />
<strong>in</strong>terviral recomb<strong>in</strong>ation, and show that <strong>the</strong> spacers of<br />
<strong>the</strong> Sulfolobus <strong>CRISPR</strong> antiviral <strong>system</strong> are not<br />
biased to <strong>the</strong> highly similar regions of <strong>the</strong> fusellovirus<br />
genomes.<br />
Received 2 April, 2009; accepted 18 June, 2009. *For correspondence.<br />
E-mail peterredder@gmail.com; Tel. (+41) 774000253; Fax<br />
(+41) 223795108.<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd<br />
Introduction<br />
In contrast to <strong>the</strong> ra<strong>the</strong>r uniform landscape of virion<br />
morphotypes <strong>in</strong> aquatic <strong>system</strong>s under moderate environmental<br />
conditions, ma<strong>in</strong>ly represented by tailed bacteriophages<br />
(reviewed by Prangishvili, 2003), virus-like<br />
particles observed <strong>in</strong> ecological niches at high temperatures,<br />
low pH or high sal<strong>in</strong>ity reveal a high diversity of<br />
complex morphotypes (Guixa-Boixareu et al., 1996; Oren<br />
et al., 1997; Rice et al., 2001; Rachel et al., 2002; Här<strong>in</strong>g<br />
et al., 2005; Porter et al., 2007; Bize et al., 2008). About<br />
40 virus species isolated from such environments, all<br />
carry<strong>in</strong>g double-stranded (ds) DNA genomes, have been<br />
described, which <strong>in</strong>fect members of <strong>the</strong> third doma<strong>in</strong> of<br />
life, <strong>the</strong> <strong>Archaea</strong> (reviewed <strong>in</strong> Prangishvili et al., 2006a).<br />
Most common are viruses with an overall sp<strong>in</strong>dle-shaped<br />
morphology, ei<strong>the</strong>r tail-less, tailed or even two-tailed,<br />
which taxonomically have been assigned to <strong>the</strong> viral<br />
families Fuselloviridae (SSV1, SSV2, SSV4, SSVrh and<br />
SSVk1, s<strong>in</strong>gle-tailed), Bicaudaviridae (ATV, two-tailed)<br />
and <strong>the</strong> genus Salterprovirus (His 1 and His 2) while some<br />
still require classification (STSV1 and PAV1) (Schleper<br />
et al., 1992; Bath and Dyall-Smith, 1998; Arnold et al.,<br />
1999; Gesl<strong>in</strong> et al., 2003; Wiedenheft et al., 2004; Xiang<br />
et al., 2005; Bath et al., 2006; Prangishvili et al., 2006b;<br />
Peng, 2008).<br />
Five fuselloviruses have so far been isolated from<br />
acidic geo<strong>the</strong>rmal environments <strong>in</strong> different locations <strong>in</strong><br />
Asia, Europe and North America, and <strong>the</strong>y replicate <strong>in</strong><br />
species of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genus Sulfolobus,<br />
which represents a significant percentage of <strong>the</strong><br />
microbial population <strong>in</strong> most acidic terrestrial hot spr<strong>in</strong>gs.<br />
Ano<strong>the</strong>r major player <strong>in</strong> <strong>the</strong>se environments is <strong>the</strong> genus<br />
Acidianus, from which several viruses have been isolated,<br />
<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> l<strong>in</strong>ear filamentous and rod-shaped viruses<br />
AFV1 and ARV1, respectively, which have close viral relatives<br />
that also <strong>in</strong>fect Sulfolobus (Prangishvili et al., 2006a;<br />
Snyder et al., 2007). Although <strong>the</strong> two genera coexist, no<br />
fusellovirus has yet been isolated from Acidianus, even<br />
though it appears to be <strong>the</strong> most predom<strong>in</strong>ant Sulfolobus<br />
viral type.<br />
The circular dsDNA genomes of five known fuselloviruses<br />
are highly similar at both nucleotide and am<strong>in</strong>o acid<br />
sequence levels, with <strong>the</strong> majority of gene products be<strong>in</strong>g
2 P. Redder et al.<br />
Fig. 1. A. Representative electron microscopy images of SSV6, SSV7 and ASV1. The end-filaments of SSV7 are very sticky, and <strong>the</strong> virus is<br />
almost always observed <strong>in</strong> ‘rosettes’ or attached to vesicles (white arrows). A rare s<strong>in</strong>gle SSV7 is also shown (dotted white arrow). SSV6 and<br />
ASV1 do not have sticky ends and are always s<strong>in</strong>gle, even when ly<strong>in</strong>g close toge<strong>the</strong>r. Fur<strong>the</strong>rmore, SSV6 and ASV1 exhibit a wide range of<br />
morphotypes, vary<strong>in</strong>g from <strong>the</strong> standard sp<strong>in</strong>dle shape to an elongated sausage shape (<strong>in</strong>dicated by dotted black arrows for SSV6).<br />
B. Magnification of <strong>the</strong> end-filaments of <strong>the</strong> three viruses. The filaments of SSV6 and ASV1 are thick, and seem to form a crown around <strong>the</strong><br />
virus tips (black arrows) whereas SSV7 carries th<strong>in</strong>ner filaments, that protrude directly from <strong>the</strong> virus tips. All samples were negatively sta<strong>in</strong>ed<br />
with 2% Uranyl acetate and <strong>the</strong> scalebars are all 100 nm.<br />
of unknown function and lack<strong>in</strong>g homologues <strong>in</strong> public<br />
sequence databases o<strong>the</strong>r than <strong>in</strong> o<strong>the</strong>r archaeal viruses<br />
(Wiedenheft et al., 2004). The viral DNA is protected<br />
aga<strong>in</strong>st <strong>the</strong> harsh environment, at temperatures above<br />
80°C and pH values below 2, with<strong>in</strong> a sp<strong>in</strong>dle-shaped<br />
virion about 100 nm long and 60 nm wide, with a bunch of<br />
short, th<strong>in</strong> fibres at one of <strong>the</strong> po<strong>in</strong>ted ends (Mart<strong>in</strong> et al.,<br />
1984; Stedman et al., 2003; Wiedenheft et al., 2004;<br />
Peng, 2008). In <strong>the</strong> electron microscopy, <strong>the</strong> body is<br />
sometimes observed to be slightly elongated and more<br />
‘cigar-shaped’, and <strong>the</strong> tail fibres appear to be quite sticky,<br />
readily attach<strong>in</strong>g to cellular fragments, as well as l<strong>in</strong>k<strong>in</strong>g<br />
virions to produce rosette-like aggregates (Fig. 1A –<br />
SSV7).<br />
SSV1 is <strong>the</strong> best studied fusellovirus, and <strong>the</strong> virion has<br />
been shown to conta<strong>in</strong> prote<strong>in</strong>s VP1, VP2, VP3 and small<br />
amounts of SSV1_D244 and SSV1_C792 (Reiter et al.,<br />
1987a; Menon et al., 2008). VP1 and VP3 are thought to<br />
be capsid prote<strong>in</strong>s, whereas VP2 has been assigned a<br />
DNA-b<strong>in</strong>d<strong>in</strong>g role, organiz<strong>in</strong>g DNA, but it is not encoded<br />
by o<strong>the</strong>r fuselloviruses (Stedman et al., 2003; Wiedenheft<br />
et al., 2004). Four non-structural SSV1 prote<strong>in</strong>s have<br />
been characterized. SSV1_D63 is considered to l<strong>in</strong>k<br />
two different prote<strong>in</strong> complexes, while SSV1_F93 and<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
Fig. 2. A. Graphical alignment of <strong>the</strong> n<strong>in</strong>e circular fuselloviral genomes, l<strong>in</strong>earized at <strong>the</strong> first nucleotide after <strong>the</strong> VP3 stop codon (follow<strong>in</strong>g<br />
<strong>the</strong> convention of Wiedenheft et al., 2004). All ORFs larger than 50 am<strong>in</strong>o acids <strong>in</strong>dicated by arrows. Shades of blue and green: 13 ‘core’<br />
genes. Dark grey: ORFs found <strong>in</strong> two or more fuselloviruses. Light grey: ORFs only found <strong>in</strong> one fusellovirus. Black: VP2. Yellow:<br />
SSV1_C792 homologues, both full length and partial. Red: SSV6_B1232 homologues. Orange: SSV1_B78 homologues. Light p<strong>in</strong>k:<br />
SSV1_D244 homologues associated with <strong>the</strong> Integrase operon <strong>in</strong> all but ASV1 and SSVk1. Dark violet and light violet: Rad3-like helicase and<br />
Msed_2283 homologues substitut<strong>in</strong>g for a large part of <strong>the</strong> Integrase operon <strong>in</strong> ASV1, SSV7 and SSVk1. Dark p<strong>in</strong>k: SSV1_F93 homologues.<br />
Brown: Highly conserved SSV1_C84 homologue overlapp<strong>in</strong>g with some of <strong>the</strong> o<strong>the</strong>r ‘core’ genes. Magenta: SSV1-C80 homologues and<br />
ASV1-A59. The transcripts identified by Fröls and colleagues (2007) are <strong>in</strong>dicated below SSV1.<br />
B. The two different putative end-filament modules, exemplified by SSV1 and SSV6.<br />
SSV1_F112 are DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>s implicated <strong>in</strong> transcriptional<br />
regulation (Kraft et al., 2004a,b; Menon et al.,<br />
2008). The fourth prote<strong>in</strong> is an <strong>in</strong>tegrase of <strong>the</strong> tyros<strong>in</strong>e<br />
recomb<strong>in</strong>ase family, which catalyses site-specific <strong>in</strong>tegration<br />
of <strong>the</strong> viral genome <strong>in</strong>to <strong>the</strong> host chromosome. As <strong>the</strong><br />
viral recomb<strong>in</strong>ation site (attP) is located with<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrase<br />
gene, <strong>in</strong>tegration leads to gene partition (Palm<br />
et al., 1991; Muskhelishvili et al., 1993). Despite this<br />
highly specialized adaptation, <strong>the</strong> <strong>in</strong>tegrase was recently<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 3<br />
shown to be non-essential for virus replication and basic<br />
viral functions (Clore and Stedman, 2007).<br />
Replication of SSV1 and SSV2 can be <strong>in</strong>duced by UV<br />
irradiation (Yeats et al., 1982; Stedman et al., 2003). The<br />
SSV1 transcription cycle, follow<strong>in</strong>g UV <strong>in</strong>duction, has also<br />
been elucidated by Nor<strong>the</strong>rn analysis, physical mapp<strong>in</strong>g<br />
and DNA microarrays, and transcripts were classified as<br />
early (T5, T6 and T9), late (T3, Tx and T8) and UV <strong>in</strong>ducible<br />
(T<strong>in</strong>d) (Fig. 2) (Reiter et al., 1987b; Fröls et al., 2007).
4 P. Redder et al.<br />
The prote<strong>in</strong>s encoded <strong>in</strong> <strong>the</strong> early transcripts of SSV1,<br />
and <strong>the</strong>ir homologues <strong>in</strong> o<strong>the</strong>r fuselloviruses, are often<br />
cyste<strong>in</strong>e-rich compared with prote<strong>in</strong>s encoded <strong>in</strong> <strong>the</strong> late<br />
transcripts (Palm et al., 1991; Stedman et al., 2003;<br />
Wiedenheft et al., 2004). This has recently been proposed<br />
to be due to <strong>in</strong>tra- and extra-cellular localization of <strong>the</strong><br />
early and late prote<strong>in</strong>s respectively (Menon et al., 2008).<br />
Here we report on <strong>the</strong> isolation and properties of four<br />
novel members of <strong>the</strong> Fuselloviridae, <strong>in</strong>fect<strong>in</strong>g species of<br />
<strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera Sulfolobus and<br />
Acidianus, almost doubl<strong>in</strong>g <strong>the</strong> number of known fuselloviruses<br />
and extend<strong>in</strong>g <strong>the</strong>ir host-range to a new genus,<br />
Acidianus. This merited a revised comparative genomic<br />
analysis of fuselloviruses, which provided <strong>in</strong>sights <strong>in</strong>to<br />
functions of some viral prote<strong>in</strong>s and addressed general<br />
questions concern<strong>in</strong>g <strong>the</strong> evolution of <strong>the</strong> viruses and<br />
<strong>in</strong>teractions with <strong>the</strong>ir hosts.<br />
Results<br />
Isolation of virus–host <strong>system</strong>s<br />
Three different methods were used to acquire <strong>the</strong> new<br />
viruses reported <strong>in</strong> this communication. SSV5 was discovered<br />
as an extrachromosomal element with<strong>in</strong> cells of<br />
S. solfataricus P2 (DSM1617), <strong>in</strong>fected as a result of<br />
mix<strong>in</strong>g <strong>the</strong> cells with an icelandic HVE14 enrichment<br />
culture (see Experimental procedures). This traditional<br />
method of isolat<strong>in</strong>g new viruses allows a large number of<br />
viruses to be screened, but it restricts <strong>the</strong> search for<br />
specific virus–host <strong>system</strong>s.<br />
A different approach was used for SSV6 and SSV7,<br />
where transmission electron microscopy analysis of <strong>the</strong><br />
supernatant from an enrichment of <strong>the</strong> G4 site at<br />
Hveregedi, Iceland, revealed a large number of fuselloviruses.<br />
Attempts to isolate s<strong>in</strong>gle virus–host <strong>system</strong>s by<br />
colony purification resulted <strong>in</strong> two pure stra<strong>in</strong>s, each harbour<strong>in</strong>g<br />
a different fusellovirus. Stra<strong>in</strong> G4T-1 was a host<br />
for Sulfolobus sp<strong>in</strong>dle-shaped virus 7 (SSV7) while stra<strong>in</strong><br />
G4ST-T-11 was <strong>the</strong> natural producer of a pleiomorphic<br />
virus named Sulfolobus sp<strong>in</strong>dle-shaped virus 6, SSV6.<br />
The former was found to be produced <strong>in</strong> very low amounts<br />
under normal growth conditions, but it was possible to<br />
<strong>in</strong>crease SSV7 production about 10-fold (as estimated by<br />
count<strong>in</strong>g viral particles <strong>in</strong> <strong>the</strong> electron microscope), ei<strong>the</strong>r<br />
by shift<strong>in</strong>g <strong>the</strong> culture to a medium with lower tryptone<br />
concentration, or by <strong>in</strong>duc<strong>in</strong>g with UV light. Stra<strong>in</strong> G4T-1<br />
and G4ST-T-11 may <strong>in</strong> fact be <strong>the</strong> same species, as <strong>the</strong>ir<br />
partial 16S rRNA sequences were identical to S. islandicus<br />
stra<strong>in</strong> I7 (AY247894.1) with a s<strong>in</strong>gle base substitution<br />
to dist<strong>in</strong>guish <strong>the</strong>m from S. solfataricus P2. This virus<br />
isolation approach did not impose any bias on <strong>the</strong> choice<br />
of viral host (except for choos<strong>in</strong>g <strong>the</strong> growth conditions),<br />
and it provided a ‘natural’ virus–host <strong>system</strong>. However, a<br />
bias is imposed on <strong>the</strong> virus, which excludes <strong>the</strong> possibility<br />
of isolat<strong>in</strong>g s<strong>in</strong>gle colonies of <strong>the</strong> host if <strong>the</strong> virus is<br />
highly lytic under <strong>the</strong> chosen conditions.<br />
F<strong>in</strong>ally, <strong>the</strong> fourth virus described here, Acidianus<br />
sp<strong>in</strong>dle-shaped virus 1, ASV1, was discovered as an<br />
extrachromosomal and <strong>in</strong>tegrated element <strong>in</strong> <strong>the</strong> course<br />
of sequenc<strong>in</strong>g <strong>the</strong> genome of Acidianus brierleyi<br />
DSM1651, and <strong>the</strong> production of virions was subsequently<br />
confirmed by electron microscopy (Fig. 1). While<br />
this method for isolat<strong>in</strong>g new viruses is not generally<br />
applicable, it is likely to become more common that extrachromosomal<br />
elements are detected while sequenc<strong>in</strong>g<br />
genomes from stra<strong>in</strong> collections.<br />
Morphology<br />
The sp<strong>in</strong>dle-shaped virion of SSV7, ~90 nm long and<br />
~50 nm wide, resembles virions of all previously known<br />
fuselloviruses morphologically, as well as by its tendency<br />
to form ‘rosettes’ by stick<strong>in</strong>g to neighbour<strong>in</strong>g viral particles<br />
(Fig. 1). In contrast, negatively sta<strong>in</strong>ed virions of SSV6<br />
and ASV1, both appear much more pleiomorphic than <strong>the</strong><br />
o<strong>the</strong>r fuselloviruses, and assume shapes rang<strong>in</strong>g from<br />
th<strong>in</strong> cigar-like to pear-like, with tail fibres at <strong>the</strong> end correspond<strong>in</strong>g<br />
to where <strong>the</strong> pear ‘stalk’ would be (Fig. 1).<br />
The virion bodies, and tail fibres of ASV1 and SSV6,<br />
seem to differ from those of <strong>the</strong> o<strong>the</strong>r fuselloviruses.<br />
Instead of multiple th<strong>in</strong> fibres, <strong>the</strong>se virions carry 3 or 4<br />
thicker and slightly curved, fibres that appear to protrude<br />
sideways, not from <strong>the</strong> particle apex but from a po<strong>in</strong>t<br />
slightly more towards <strong>the</strong> body (Fig. 1B). Fur<strong>the</strong>rmore, <strong>the</strong><br />
ASV1 and SSV6 fibres seem to be less ‘sticky’ than <strong>the</strong>ir<br />
th<strong>in</strong> counterparts, and <strong>the</strong> characteristic ‘rosettes’ were<br />
never observed for ASV1 and SSV6.<br />
To exclude that <strong>the</strong> observed pleiomorphicity of SSV6<br />
virions was an artifact caused by <strong>the</strong> purification process,<br />
or by uranyl-acetate sta<strong>in</strong><strong>in</strong>g, two control experiments<br />
were carried out. (i) The virions were analysed by EM<br />
directly after removal of host cells by mild centrifugation at<br />
4000 r.p.m. (Jouan S40 rotor), and although omitt<strong>in</strong>g <strong>the</strong><br />
concentration step yielded few virions, <strong>the</strong>y exhibited <strong>the</strong><br />
normal pleiomorphicity. (ii) The virion pleiomorphicity was<br />
also observed when we used phosphotungstenate as an<br />
alternative contrast<strong>in</strong>g agent (not shown), confirm<strong>in</strong>g that<br />
<strong>the</strong> heterogeneity of <strong>the</strong> shape was an <strong>in</strong>tegral property of<br />
<strong>the</strong> virions ra<strong>the</strong>r than a result of <strong>the</strong> experimental treatment.<br />
Moreover, SSV7 virions, for which little to no pleiomorphicity<br />
was observed, were rout<strong>in</strong>ely treated <strong>in</strong> exactly<br />
<strong>the</strong> same manner as SSV6 and ASV1 virions (Fig. 1A).<br />
Genomic organization and comparison<br />
Ow<strong>in</strong>g to <strong>the</strong>ir special structural properties, we orig<strong>in</strong>ally<br />
suspected that ASV1 and SSV6 were representatives of<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
a new sp<strong>in</strong>dle-shaped viral family. However, genome<br />
analyses revealed that <strong>the</strong>y, and <strong>the</strong> SSV5 and SSV7<br />
isolates, are all closely related to known members of <strong>the</strong><br />
family Fuselloviridae, and we <strong>the</strong>refore assign <strong>the</strong> four<br />
newly isolated viruses to this family. The similarities are<br />
evident, both <strong>in</strong> terms of overall gene synteny and<br />
sequence similarity (Table 1, Figs 2 and 3), and also<br />
extends to <strong>the</strong> distribution of <strong>the</strong> cyste<strong>in</strong>e codons <strong>in</strong> a<br />
manner that supports <strong>the</strong> f<strong>in</strong>d<strong>in</strong>gs of Menon and<br />
colleagues (2008).<br />
Sequence similarity among <strong>the</strong> fuselloviruses<br />
The genome of ASV1 carries 24 186 bp and is by far <strong>the</strong><br />
largest of <strong>the</strong> fuselloviruses, and one or two gene duplications<br />
appear to have occurred (ASV1_B91 and<br />
ASV1_C137), as well as <strong>the</strong> acquisition of new genes.<br />
Most of <strong>the</strong> ASV1 genome is closely related to <strong>the</strong> o<strong>the</strong>r<br />
fuselloviruses, with several regions of more than 75%<br />
identity at <strong>the</strong> nucleotide level (Fig. 3). One 5.6 kb region<br />
that is similar to SSV6, starts <strong>in</strong> <strong>the</strong> middle of ASV1_C213<br />
and ends <strong>in</strong> ASV1_B90 (Fig. 3D).<br />
An extreme example of how closely related some fuselloviruses<br />
are, can be seen by compar<strong>in</strong>g SSV4 and SSV5,<br />
where a 7.9 kb region is almost 100% identical (Fig. 3B),<br />
consistent with a recent recomb<strong>in</strong>ation event hav<strong>in</strong>g<br />
occurred between <strong>the</strong> viruses. Moreover, <strong>the</strong> junctions of<br />
nucleotide similarity regions are generally <strong>in</strong>tragenic, such<br />
that sections of high sequence similarity are mostly short,<br />
distributed all over <strong>the</strong> genomes, and often start and stop<br />
<strong>in</strong> <strong>the</strong> middle of open read<strong>in</strong>g frames (ORFs) (Fig. 3).<br />
These patterns of similarity raise <strong>in</strong>terest<strong>in</strong>g questions<br />
concern<strong>in</strong>g <strong>in</strong>terplay and recomb<strong>in</strong>ation between fuselloviral<br />
genomes.<br />
The presence of regions of nucleotide identity between<br />
<strong>the</strong> fuselloviruses raises <strong>the</strong> question as to how <strong>the</strong>y avoid<br />
<strong>the</strong> extensive antiviral <strong>CRISPR</strong> <strong>system</strong>s present <strong>in</strong> all<br />
sequenced Sulfolobus genomes. Therefore, we analysed<br />
<strong>the</strong> correlation between sequence match<strong>in</strong>g of <strong>CRISPR</strong>spacers<br />
and fuselloviral genomes. A total of 3420<br />
<strong>CRISPR</strong> spacer sequences were obta<strong>in</strong>ed from four complete<br />
and n<strong>in</strong>e <strong>in</strong>complete Sulfolobales genomes (after<br />
subtract<strong>in</strong>g <strong>the</strong> 278 spacers which S. solfataricus P1 and<br />
P2 have <strong>in</strong> common). N<strong>in</strong>ety-one of <strong>the</strong>se spacers match<br />
to one or more of <strong>the</strong> fuselloviruses on a nucleotide<br />
sequence level. An additional 101 spacers were found<br />
match<strong>in</strong>g to one or more fuselloviruses when extend<strong>in</strong>g<br />
<strong>the</strong> search to <strong>the</strong> am<strong>in</strong>o acid sequence level. Thus out of<br />
<strong>the</strong> 3420 Sulfolobales spacers, <strong>in</strong> total 192 spacers yield<br />
436 significant matches to fuselloviral genomes. The latter<br />
number exceeds <strong>the</strong> former because many spacers,<br />
especially on <strong>the</strong> am<strong>in</strong>o acid sequence level, yield<br />
matches to more than one virus, and because some<br />
spacers match to repeats with<strong>in</strong> <strong>the</strong> same viral genome.<br />
We found no biased correlation between conserved<br />
regions and spacer matches, and it is possible that fuselloviruses<br />
recomb<strong>in</strong>e frequently enough to reduce <strong>the</strong><br />
effectiveness of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>. The results are<br />
summarized <strong>in</strong> Fig. 4, exemplified by SSV2 which has <strong>the</strong><br />
highest number of spacer matches, and by <strong>the</strong> most distantly<br />
related fusellovirus, ASV1. The spacer matches<br />
occur on both strands of <strong>the</strong> viruses, consistent with DNA<br />
recognition by <strong>the</strong> spacer transcripts, as recently proposed<br />
(Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008; Shah et al.,<br />
2009).<br />
Encoded prote<strong>in</strong>s<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 5<br />
Many of <strong>the</strong> ORFs encoded on ASV1 yield no, or very<br />
weak, matches <strong>in</strong> public sequence databases, especially<br />
ORFs found <strong>in</strong> <strong>the</strong> ‘extra’ ~6 kb that are not present <strong>in</strong><br />
o<strong>the</strong>r fuselloviruses. Exceptions are ASV1_B276 and<br />
C106, which are homologous to genes from an <strong>in</strong>tegrated<br />
virus <strong>in</strong> <strong>the</strong> Sulfolobus tokodaii chromosome (ST1724 and<br />
ST1725), and ASV1_A59, which exhibits sequence similarity<br />
to CopG transcriptional regulators <strong>in</strong> M. sedula and<br />
S. acidocaldarius. Both SSV6 and ASV1 encode a homologue<br />
of <strong>the</strong> structural prote<strong>in</strong> SSV1_VP2, which is absent<br />
from <strong>the</strong> o<strong>the</strong>r six fuselloviruses. Fur<strong>the</strong>rmore, SSV6 and<br />
ASV1 do not encode a full-length SSV1_C792 homologue,<br />
and SSV1_B78 homologue, as do all o<strong>the</strong>r fuselloviruses<br />
(Fig. 2B). Instead, <strong>the</strong>y carry two o<strong>the</strong>r genes: a<br />
small gene (SSV6_C213 and ASV1_B208) homologous<br />
only to <strong>the</strong> C-term<strong>in</strong>al 170 aa of SSV1_C792, and follow<strong>in</strong>g<br />
this gene, a large ORF (ASV1_A1231 and SSV6_<br />
B1232), which is similar to Saci_1002 from Sulfolobus<br />
acidocaldarius (49% identity, 65% similarity, for<br />
SSV6_B1232). No o<strong>the</strong>r sequence similarity is found <strong>in</strong><br />
databases, but a clue to <strong>the</strong> function of both <strong>the</strong><br />
SSV1_C792 and SSV6_B1232 homologues is given by<br />
<strong>the</strong> Phyre fold-prediction-server (Kelley and Sternberg,<br />
2009), which suggest <strong>the</strong>y both have a fold similar to<br />
<strong>the</strong> adsorption prote<strong>in</strong> P2 from bacteriophage prd1<br />
(E-value < 0.5, estimated precision 85%).<br />
ASV1, SSV7 and SSVk1 differ from <strong>the</strong> o<strong>the</strong>r fuselloviruses<br />
by lack<strong>in</strong>g all genes of <strong>the</strong> SSV1_T5 operon except<br />
<strong>the</strong> <strong>in</strong>tegrase and, for ASV1 and SSVk1, a predicted<br />
helix–turn–helix transcriptional regulator (Fig. 2). Instead,<br />
<strong>the</strong> three viruses carry a set of ORFs on <strong>the</strong> plus-strand,<br />
which encode a putative Rad3-like helicase, an Msed_<br />
2283 homologue (hypo<strong>the</strong>tical prote<strong>in</strong>) and a few small<br />
prote<strong>in</strong>s (Fig. 2).<br />
Beside <strong>the</strong>se peculiarities of <strong>the</strong> <strong>in</strong>dividual genomes,<br />
analyses have revealed 13 genes that are conserved <strong>in</strong> all<br />
n<strong>in</strong>e fuselloviruses. These ‘core’ genes <strong>in</strong>clude VP1 and<br />
VP3, <strong>the</strong> <strong>in</strong>tegrase and three putative transcriptional regulators,<br />
<strong>in</strong>clud<strong>in</strong>g one helix–turn–helix and two z<strong>in</strong>c-f<strong>in</strong>ger<br />
prote<strong>in</strong>s (Fig. 2). The attP sites with<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrase genes
6 P. Redder et al.<br />
Table 1. Genes <strong>in</strong> SSV5, SSV6, SSV7 and ASV1, as well as <strong>the</strong> homologues from o<strong>the</strong>r fuselloviruses.<br />
Size-range<br />
(aa) Comments<br />
ASV1<br />
(24 186 bp)<br />
SSV7<br />
(17 602 bp)<br />
SSV6<br />
(15 684 bp)<br />
SSVrh<br />
(16 473 bp)<br />
SSVk1<br />
(17 385 bp)<br />
SSV5<br />
(15 330 bp)<br />
SSV4<br />
(15 135 bp)<br />
SSV2<br />
(14 796 bp)<br />
SSV1<br />
(15 465 bp)<br />
Japan Iceland Iceland Iceland Kamchatka USA Iceland Iceland USA Isolated from<br />
Russia<br />
Arg (CCG) Gly (CCC) Glu (TTC) Gln (CTG) Asp (GTC), Leu (GAG) Gln (CTG) Gly (CCC) Lys (TTT) Match<strong>in</strong>g S. solfataricus P2 tRNA of <strong>the</strong> attP site <strong>in</strong><br />
Glu (CTC),<br />
<strong>the</strong> <strong>in</strong>tegrase gene (anticodon)<br />
Glu (TTC)<br />
VP2 C76 A82a 74–82 VP2 prote<strong>in</strong> detected <strong>in</strong> <strong>the</strong> SSV1 virion and thought<br />
to be <strong>the</strong> DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (Reiter et al., 1987a)<br />
A82 ORF83 ORF82 gp07 B83 A83 C83 C82 A83 82–83 Putative membrane prote<strong>in</strong>a a<br />
C84 ORF88c ORF81 gp08 B90 C78 B81 B83 C97 81–104<br />
A92 ORF90 ORF89 gp09 A82 A93 C90 C90 A94 89–94 Overlaps o<strong>the</strong>r genes<br />
B277 ORF276 ORF280 gp10 C279 C277 A269 A281 C263 269–281 Putative membrane prote<strong>in</strong>a A154 ORF153 ORF152 gp11 C157 C154 B149 C150 C155 149–157 Also found <strong>in</strong> pSSVxa B251 ORF233 ORF233 gp12 A231 A247 C234 A255 A232 231–255 DnaA-like (Koon<strong>in</strong>, 1992)<br />
Also <strong>in</strong> pSSVx, ATV and A. pernixa D335 ORF328 ORF330 Integrase F340 D355 F354 D336 D347 328–355 Integrasea E79 79<br />
C176 176<br />
A66* C72 A58a 58–72 Also found <strong>in</strong> AFV2 (gp06)<br />
B204 A171 171–204<br />
C74 B80 74–80<br />
B494 A583 C559 494–583 Rad3-like helicase<br />
A460 B471 C674 460–674 Similar to Metallosphera sedula prote<strong>in</strong> Msed_2283<br />
B192 192 Similar to C-term<strong>in</strong>al of ASV1_C674<br />
A136 B119 119–154<br />
B64 B102 64–102<br />
D244 ORF211 ORF209 gp15 D212 F215 209–244 Similar to Saci_0475<br />
D108 108 Similar to SIRV2gp12<br />
F90 90 Similar to ORFs from pARN3 and pSOG1<br />
E94 94<br />
F93 E81 F110 D95 81–110 Putative HTH transcriptional regulator (Kraft et al.,<br />
2004b)<br />
D63 ORF57 ORF63 gp16 F61 E60 57–63 3D X-ray structure from SSV1 (Kraft et al., 2004a)<br />
ORF159b gp18 E152 F185 152–185<br />
ORF61 ORF61 gp21 F62 E61 61–62<br />
ORF79a ORF73 gp23 E73 D77 73–79<br />
A49 49 C-term<strong>in</strong>al similar to SSV7_B76<br />
A100 ORF96 ORF96 gp24 C96 C93 C106 C96 93–106 Weak hit to ARV1<br />
C48 C49 48–49<br />
ORF88a B87 87–88<br />
B92 92<br />
A59 59 Similar to CopG from M. sedula and Saci_0942.<br />
Possible functional homologue of SSV1_C80<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
C80 ORF82A ORF79 gp26 C82 B64 A78 C80 64–82 RHH prote<strong>in</strong>, CopG-likea A109 109 Paralogue of ASV1_B91<br />
A79 ORF82B ORF80B gp27 A80 B79 B82 B82 B91 79–91 Z<strong>in</strong>c f<strong>in</strong>ger motif. Similar to ATV_gp28 and<br />
pHVE14–51. ASV1_B91 is a paralogue of<br />
ASV1_A109a C54 54<br />
C102a ORF100 ORF100 gp29 B98 A102b C100 A101 98–102 B-block_TFIIC-doma<strong>in</strong>, Z<strong>in</strong>c f<strong>in</strong>ger<br />
ORF205 ORF206 gp30 A204 C287 B206 204–287 Similar to <strong>CRISPR</strong> associated gene Cas4 <strong>in</strong><br />
Staphylo<strong>the</strong>rmus mar<strong>in</strong>us.<br />
B129 ORF155 ORF124 gp31 B158 C150 B123 C128 C137 124–173 Two Z<strong>in</strong>c f<strong>in</strong>ger motifs. ASV1_C137 is a paralogue<br />
of ASV1_C125<br />
B99 99<br />
ORF107b gp32 B111 C113 C113 107–113 Similar to ST1721 from S. tokodaii b<br />
ORF311 gp33 B252 252–311 Similar to ST1722 from S. tokodaii<br />
ORF111 gp35 C108 108–111 Similar to ST1723 from S. tokodaii<br />
B85 C62 62–85<br />
C247 A298 247–298<br />
B74 C67 67–74<br />
B276 276 Similar to ST1724 from S. tokodaii<br />
C106 106 Similar to ST1725 from S. tokodaii<br />
C125 125 Paralogue of ASV1_C137<br />
A367 367<br />
A137 137 Similar to STS262 from S. tokodaii<br />
C806 806 558–785 similar to APE_0858 from Aeropyrum pernix<br />
A96 96<br />
C792 ORF809 ORF808 gp01 B793 B812 C213 C811 B208 208–812 ASV1_B208 and SSV6_C211 are similar to <strong>the</strong><br />
C-term<strong>in</strong>al of <strong>the</strong> C792 homologues<br />
B78 ORF79 ORF80a gp02 A79 A79 B79 79–80 Part of <strong>the</strong> SSV1_C792 module<br />
B68 A58b 58–68<br />
B1232 A1231 1231–1232 Similar to Saci1002 from S. acidocaldarius<br />
C166 ORF176 ORF167 gp03 B169 B170 C134 C170 B130 130–176 Gapped <strong>in</strong> ASV1 and SSV6. Putative membrane<br />
prote<strong>in</strong><br />
B115 ORF112 ORF107a gp04 A123 A113 A88 B112 A82b 82–123 Putative HTH transcriptional regulator<br />
Shorter <strong>in</strong> ASV1 and SSV6a VP1 ORF88b ORF136 VP1 B137 A89 A143 C88 A140 88–143 VP1 structural prote<strong>in</strong> <strong>in</strong> SSV1 (Reiter et al., 1987a) a<br />
VP3 ORF92 ORF92 VP3 A93 C96 B94 C97 B90 92–96 VP3 structural prote<strong>in</strong> <strong>in</strong> SSV1 (Reiter et al., 1987a)<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 7<br />
A, B and C <strong>in</strong>dicate genes on <strong>the</strong> three read<strong>in</strong>g frames of <strong>the</strong> plus-strand, and D, E and F <strong>in</strong>dicate genes on <strong>the</strong> m<strong>in</strong>us-strand. The number follow<strong>in</strong>g <strong>the</strong> letter is <strong>the</strong> number of encoded am<strong>in</strong>o<br />
acids. The 13 ‘core’ genes are <strong>in</strong> boldface, and prote<strong>in</strong>s for which experimental data are available are underl<strong>in</strong>ed. The asterisk <strong>in</strong>dicates an ad hoc ORF name for a gene which is not present <strong>in</strong><br />
<strong>the</strong> NCBI annotation.<br />
a. Core gene <strong>in</strong> Held and Whitaker (2009).<br />
b. The upstream 40 bp of <strong>the</strong> SSV7_C113 homologues are highly conserved <strong>in</strong> all fuselloviruses, with two copies <strong>in</strong> ASV1. In SSV1, this motif is immediately next to <strong>the</strong> BRE+TATA-box of <strong>the</strong> T3<br />
transcript.
8 P. Redder et al.<br />
Fig. 3. Similarity at <strong>the</strong> nucleotide level between selected representative pairs of fusellovirusal genomes.<br />
A. Comparison between SSV1 and SSV5.<br />
B. Between SSV5 and SSV4.<br />
C. Between SSV4 and SSV6.<br />
D. Between SSV6 and ASV1.<br />
E. Between ASV1 and SSVk1.<br />
F. Between SSVk1 and SSV7.<br />
Regions of high (> 70%) pairwise identity on <strong>the</strong> nucleotide level (light grey boxes) are <strong>in</strong>terspersed by regions with no detectable similarity<br />
(white boxes). The dark grey box <strong>in</strong>dicates an exceptional example of similarity between SSV4 and SSV5, where a 7.9 kb region is almost<br />
100% identical between <strong>the</strong> two genomes. The junctions between similar regions and a dissimilar regions (<strong>in</strong>dicated by dotted l<strong>in</strong>es) often<br />
occur <strong>in</strong> <strong>the</strong> middle of genes, and are not conf<strong>in</strong>ed to <strong>in</strong>tergenic regions. Short regions (< 100 bp) of similarity or dissimilarity are not shown.<br />
Black arrows denote ‘core’ genes, dark grey arrows denote ORFs that are found <strong>in</strong> more than one fusellovirus, and light grey arrows denote<br />
ORFs that have no homologues <strong>in</strong> <strong>the</strong> database, some of which may not be prote<strong>in</strong>-cod<strong>in</strong>g.<br />
all have <strong>the</strong>ir best hits to tRNAs from S. solfataricus, with<br />
Gln, Gln, Gly and Lys for SSV5, SSV6, SSV7 and ASV1<br />
respectively. Table 1 shows an overview of <strong>the</strong> genes <strong>in</strong><br />
SSV5, SSV6, SSV7 and ASV1, as well as <strong>the</strong> correspond<strong>in</strong>g<br />
homologues <strong>in</strong> <strong>the</strong> o<strong>the</strong>r fuselloviruses.<br />
Discussion<br />
In this paper we describe four new members of <strong>the</strong> family<br />
Fuselloviridae, SSV5, SSV6, SSV7 and ASV1, isolated<br />
from acidic hot spr<strong>in</strong>gs of Iceland and USA, which <strong>in</strong>fect<br />
members of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal genera<br />
Sulfolobus and Acidianus.<br />
Until now, fuselloviruses had only been found to replicate<br />
<strong>in</strong> Sulfolobus species. Our discovery of ASV1 <strong>in</strong><br />
Acidianus brierleyi shows that fuselloviruses can propagate<br />
<strong>in</strong> both <strong>the</strong> major culturable genera from aerobic,<br />
acidic hot spr<strong>in</strong>gs. Therefore, it is likely that fuselloviruses<br />
also <strong>in</strong>fect o<strong>the</strong>r host species from <strong>the</strong>se environments,<br />
such as Caldococcus, Vulcanisaeta and<br />
Stygiolobus (Snyder et al., 2007). Fur<strong>the</strong>rmore, <strong>the</strong><br />
family Fuselloviridae presumably also extends its host<br />
range <strong>in</strong>to <strong>the</strong> vast number of currently uncultured<br />
species found <strong>in</strong> o<strong>the</strong>r extreme environments, such<br />
as <strong>the</strong> acid m<strong>in</strong>e dra<strong>in</strong>age eco<strong>system</strong>, where a VP2<br />
homologue, recently found by community genomics<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
Fig. 4. <strong>CRISPR</strong> spacer sequence matches for ASV1 and SSV2 are superimposed on l<strong>in</strong>earized genome maps of ASV1 and SSV2<br />
respectively. ORFs are shown as arrows above and below <strong>the</strong> l<strong>in</strong>e. Sequence matches to spacers are shown as vertical l<strong>in</strong>es. The black<br />
vertical l<strong>in</strong>es denote <strong>the</strong> nucleotide sequence matches, and <strong>the</strong> grey vertical l<strong>in</strong>es show match<strong>in</strong>g am<strong>in</strong>o acid sequences, after translation of<br />
<strong>the</strong> spacer sequences from both DNA strands. The dark boxes below <strong>the</strong> genome maps <strong>in</strong>dicate areas > 50 bp with nucleotide level sequence<br />
similarity to o<strong>the</strong>r fuselloviruses (<strong>the</strong> relevant fusellovirus is <strong>in</strong>dicated to <strong>the</strong> left of <strong>the</strong> dark boxes). In total, <strong>the</strong>re are 12 spacer matches to<br />
ASV1 and 22 matches to SSV2 at a nucleotide level. At an am<strong>in</strong>o acid sequence level, <strong>the</strong>re are 42 spacer matches to ASV1 and 28 matches<br />
to SSV2.<br />
(Andersson and Banfield, 2008), <strong>in</strong>dicate <strong>the</strong> presence<br />
of fuselloviruses.<br />
‘Core’ genes<br />
By almost doubl<strong>in</strong>g <strong>the</strong> number of described fuselloviruses,<br />
we are ref<strong>in</strong><strong>in</strong>g <strong>the</strong> def<strong>in</strong>ion of ‘core’ genes of <strong>the</strong><br />
family. The 18 conserved, or ‘core’ genes, that were<br />
def<strong>in</strong>ed for SSV1, SSV2, SSVk1 and SSVrh (Wiedenheft<br />
et al., 2004) can now be reduced to 13 (Table 1) and<br />
may have to be revised fur<strong>the</strong>r as more fuselloviruses<br />
are sequenced, but our f<strong>in</strong>d<strong>in</strong>gs correlate well with a<br />
recent analysis of fuselloviral proviruses <strong>in</strong> S. islandicus<br />
stra<strong>in</strong>s (Held and Whitaker, 2009). We exclude <strong>the</strong><br />
SSV1_C792 homologues from <strong>the</strong> list of ‘core’ genes,<br />
because we do not consider SSV6_C213 and<br />
ASV1_B208 to be able to fully complement <strong>the</strong> prote<strong>in</strong>s<br />
found <strong>in</strong> o<strong>the</strong>r fuselloviruses, which are about four times<br />
larger (Table 1).<br />
Six of <strong>the</strong> ‘core’ genes have no discernible function<br />
based on <strong>the</strong>ir primary sequence, except for some of<br />
<strong>the</strong>m carry<strong>in</strong>g predicted transmembrane segments<br />
(Table 1), and experimental data will be needed to determ<strong>in</strong>e<br />
<strong>the</strong>ir functional roles. Of <strong>the</strong> rema<strong>in</strong><strong>in</strong>g seven, <strong>the</strong><br />
<strong>in</strong>tegrase function was characterized experimentally<br />
(Muskhelishvili et al., 1993; Muskhelishvili, 1994; Serre<br />
et al., 2002; Letzelter et al., 2004; Clore and Stedman,<br />
2007). Moreover, VP1 and VP3 are virion components <strong>in</strong><br />
SSV1 virions, and VP1 is processed from <strong>the</strong> N-term<strong>in</strong>us<br />
<strong>in</strong> SSV1, to a length of 73 aa (Reiter et al., 1987a), which<br />
may expla<strong>in</strong> <strong>the</strong> significant size difference we observe<br />
among <strong>the</strong> VP1 genes (Table 1). The rema<strong>in</strong><strong>in</strong>g<br />
C-term<strong>in</strong>us of VP1 is similar <strong>in</strong> both length and sequence<br />
to <strong>the</strong> VP3 prote<strong>in</strong>, and <strong>the</strong>ir roles might be partially <strong>in</strong>terchangeable<br />
<strong>in</strong> <strong>the</strong> virion matrix. Bio<strong>in</strong>formatical analyses<br />
predict DnaA-like activity for SSV1_B251 homologues<br />
(Koon<strong>in</strong>, 1992) and transcriptional regulation activity for<br />
three o<strong>the</strong>r ‘core’ genes: SSV1_A79 and SSV1_B129,<br />
which are transcribed early, dur<strong>in</strong>g <strong>in</strong>fection and are probably<br />
<strong>in</strong>volved <strong>in</strong> controll<strong>in</strong>g <strong>the</strong> hosts transcriptional apparatus,<br />
and SSV1_B115, which is co-transcribed toge<strong>the</strong>r<br />
with VP1, VP2, VP3 and SSV1_C792, later <strong>in</strong> <strong>in</strong>fection,<br />
and may be <strong>in</strong>volved <strong>in</strong> controll<strong>in</strong>g <strong>the</strong> assembly and/or<br />
packag<strong>in</strong>g of virions.<br />
‘Non-core’ genes<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 9<br />
Genes that are highly conserved but present <strong>in</strong> a subset<br />
of <strong>the</strong> fuselloviruses could provide a possible way of classify<strong>in</strong>g<br />
<strong>the</strong> fuselloviruses <strong>in</strong>to subgroups, albeit subgroups<br />
that overlap.<br />
Thus, ASV1, SSV6 and SSV1, encode a VP2 homologue,<br />
<strong>in</strong>dicat<strong>in</strong>g that <strong>the</strong>y all share a DNA packag<strong>in</strong>g<br />
<strong>system</strong>. However, <strong>the</strong> difference to <strong>the</strong> SSV1 prote<strong>in</strong> <strong>in</strong><br />
<strong>the</strong> C-term<strong>in</strong>us may <strong>in</strong>dicate an alternative mode of <strong>in</strong>teraction<br />
of <strong>the</strong> prote<strong>in</strong> and viral DNA with <strong>the</strong> major virion<br />
prote<strong>in</strong>s, VP1 and VP3.<br />
Ano<strong>the</strong>r subgroup would be <strong>the</strong> SSVs, which all encode<br />
a highly conserved homologue of SSV1_C80, a prote<strong>in</strong><br />
conta<strong>in</strong><strong>in</strong>g <strong>the</strong> RHH 1 CopG doma<strong>in</strong>. ASV1 does not<br />
encode any gene with obvious sequence similarity to<br />
SSV1_C80. However, ASV1_A59 also has an RHH 1<br />
CopG doma<strong>in</strong>, although it groups with o<strong>the</strong>r RHH1conta<strong>in</strong><strong>in</strong>g<br />
genes, <strong>in</strong>clud<strong>in</strong>g a few Sulfolobus chromosomal<br />
genes (e.g. Saci_0942). Fur<strong>the</strong>rmore, ASV1_A59<br />
occupies <strong>the</strong> same genomic position as <strong>the</strong> SSV1_C80
10 P. Redder et al.<br />
homologues do <strong>in</strong> <strong>the</strong> SSV genomes, and it very likely<br />
acts as a functional homologue of SSV1_C80.<br />
A third subgroup consists of ASV1, SSV7 and SSVk1,<br />
which all encode <strong>the</strong> Rad3-like helicase prote<strong>in</strong> and <strong>the</strong><br />
neighbour<strong>in</strong>g Msed_2283 homologue (Fig. 2). The presence<br />
of <strong>the</strong> helicase strongly suggests that <strong>the</strong>se two<br />
prote<strong>in</strong>s are <strong>in</strong>volved <strong>in</strong> DNA replication or recomb<strong>in</strong>ation,<br />
and it is possible that <strong>the</strong> o<strong>the</strong>r fuselloviruses recruit host<br />
prote<strong>in</strong>s to fulfill <strong>the</strong> same function.<br />
A possible filament prote<strong>in</strong><br />
The most strik<strong>in</strong>g genomic difference among <strong>the</strong> fuselloviruses<br />
is <strong>the</strong> ‘replacement’ of <strong>the</strong> SSV1_C792 module<br />
with <strong>the</strong> SSV6_B1232 module (Fig. 2B). It seems <strong>the</strong><br />
C-term<strong>in</strong>al 170 aa from SSV1_C792 are essential, s<strong>in</strong>ce<br />
<strong>the</strong>y are reta<strong>in</strong>ed as a small separate gene <strong>in</strong> both <strong>the</strong><br />
ASV1 and SSV6 genomes; however, <strong>the</strong> rema<strong>in</strong><strong>in</strong>g<br />
~620 aa of SSV1_C792 and <strong>the</strong> whole of SSV1_B78 are<br />
substituted by SSV6_B1232. The presence of <strong>the</strong><br />
SSV6_B1232 module correlates with a difference <strong>in</strong> <strong>the</strong><br />
number and structure of <strong>the</strong> sticky term<strong>in</strong>al filaments of<br />
<strong>the</strong> SSV6 and ASV1 virions, when compared with <strong>the</strong><br />
SSV1_C792 module viruses (Fig. 1B). Possibly, <strong>the</strong>re is a<br />
phenotype–genotype l<strong>in</strong>k, with <strong>the</strong> SSV1_C792 module<br />
be<strong>in</strong>g responsible for <strong>the</strong> multiple, th<strong>in</strong>, sticky filaments<br />
and <strong>the</strong> SSV6_B1232 module for <strong>the</strong> few, thick, less sticky<br />
filaments. In support of this hypo<strong>the</strong>sis, small amounts of<br />
SSV1_C792 were recently found by mass-spectrometry<br />
<strong>in</strong> SSV1 virions (Menon et al., 2008). Moreover, <strong>the</strong> Phyre<br />
prediction tool suggested that both SSV1_C792 and<br />
SSV6_B1232 had a similar fold to <strong>the</strong> P2 receptor b<strong>in</strong>d<strong>in</strong>g<br />
prote<strong>in</strong> prd1, and it was recently shown that a large<br />
prote<strong>in</strong> is responsible for <strong>the</strong> sticky end-fibres <strong>in</strong> <strong>the</strong> rudivirus<br />
SIRV2 (Ste<strong>in</strong>metz et al., 2008). Never<strong>the</strong>less,<br />
fur<strong>the</strong>r studies will be needed to determ<strong>in</strong>e <strong>the</strong> exact<br />
functions of <strong>the</strong> SSV1_C792 and SSV6_B1232 modules<br />
<strong>in</strong> fuselloviruses.<br />
Fuselloviral nucleotide similarity and a putative<br />
mechanism for <strong>in</strong>terviral recomb<strong>in</strong>ation<br />
The multiple regions of high nucleotide similarity, or even<br />
identity, between <strong>the</strong> fuselloviral genomes do not represent<br />
a ‘core’ fusello-genome, s<strong>in</strong>ce <strong>the</strong> regions of similarity<br />
differ between <strong>the</strong> various pairs of viruses, and often do<br />
not <strong>in</strong>clude <strong>the</strong> ‘core’ genes (Fig. 3). Instead, <strong>the</strong> pattern<br />
of similar and non-similar sections of DNA <strong>in</strong>dicates<br />
frequent recomb<strong>in</strong>ation events between fuselloviruses,<br />
similar to that observed for some bacteriophages (Hendrix<br />
et al., 1999). Possibly this occurs between pairs of fuselloviruses,<br />
present <strong>in</strong> <strong>the</strong> same host; however, we do not<br />
see a similar pattern of sequence similarity for <strong>the</strong> l<strong>in</strong>ear<br />
non-<strong>in</strong>tegrat<strong>in</strong>g archaeal viruses (Vestergaard et al.,<br />
2008a). Therefore, we suggest that a different mechanism<br />
is more likely.<br />
Integrated fusellovirus genomes have been found <strong>in</strong> <strong>the</strong><br />
Sulfolobus solfataricus P2 and <strong>in</strong> four S. islandicus chromosomes,<br />
where no trace of <strong>the</strong> covalently closed circular<br />
DNA (cccDNA) form was detected (Stedman et al., 2003;<br />
Held and Whitaker, 2009). Once a virus has been<br />
‘caught’, a second, slightly different, fusellovirus might<br />
<strong>in</strong>fect <strong>the</strong> same host, and <strong>in</strong>sert itself <strong>in</strong>to <strong>the</strong> same tRNA<br />
gene, result<strong>in</strong>g <strong>in</strong> a concatamer of <strong>the</strong> two fuselloviruses<br />
<strong>in</strong> <strong>the</strong> host chromosome (Fig. 5). This structure might be<br />
ma<strong>in</strong>ta<strong>in</strong>ed for a couple of generations, but it would be<br />
<strong>in</strong>herently unstable if <strong>the</strong> two viral genomes are reasonably<br />
similar, as <strong>the</strong>re would be a high chance of homologous<br />
recomb<strong>in</strong>ation between <strong>the</strong> two <strong>in</strong>tegrated viruses.<br />
Such a recomb<strong>in</strong>ation event would lead to <strong>the</strong> formation of<br />
one cccDNA virus and one <strong>in</strong>serted virus, both of which<br />
would consist of a part of each of <strong>the</strong> orig<strong>in</strong>al two viruses<br />
(Fig. 5). Ow<strong>in</strong>g to <strong>the</strong> very short sequence similarity<br />
required for homologous recomb<strong>in</strong>ation <strong>in</strong> Sulfolobus<br />
(Grogan, 2009), <strong>the</strong> cross-over po<strong>in</strong>t could potentially be<br />
<strong>in</strong> many different places, and each of <strong>the</strong>se recomb<strong>in</strong>ation<br />
events would form a unique mixture of <strong>the</strong> two viruses,<br />
similar to meiosis <strong>in</strong> eukaryotes. Thus, this offers a<br />
mechanism for rapidly generat<strong>in</strong>g a large number of<br />
diverse viral offspr<strong>in</strong>g. Our model does not exclude direct<br />
recomb<strong>in</strong>ation between <strong>the</strong> cccDNA forms of fuselloviruses,<br />
but we propose that this type of ‘tandem <strong>in</strong>sertion’<br />
event happens frequently (on an evolutionary scale) <strong>in</strong><br />
nature, and that repeated events, each <strong>in</strong>volv<strong>in</strong>g a different<br />
pair of ‘parent’ fuselloviruses, would eventually<br />
produce <strong>the</strong> patchwork viral genomes we see today<br />
(Fig. 3).<br />
Our model also serves to expla<strong>in</strong> why fuselloviruses<br />
have developed an <strong>in</strong>tegrase that is <strong>in</strong>activated upon <strong>in</strong>tegration.<br />
The <strong>in</strong>tegrase is not essential for viral propagation<br />
(Clore and Stedman, 2007) but if <strong>the</strong> proposed recomb<strong>in</strong>ation<br />
mechanism is correct, <strong>the</strong>n <strong>the</strong> unique SSV-type<br />
<strong>in</strong>tegrase will help <strong>the</strong> virus <strong>in</strong> <strong>the</strong> long term, by promot<strong>in</strong>g<br />
recomb<strong>in</strong>ation with closely related viruses, s<strong>in</strong>ce <strong>the</strong> <strong>in</strong>activation<br />
provides a high chance of <strong>the</strong> viral genome be<strong>in</strong>g<br />
‘caught’ <strong>in</strong> an <strong>in</strong>tegrated form <strong>in</strong> <strong>the</strong> chromosome. Never<strong>the</strong>less,<br />
<strong>in</strong>activation of <strong>the</strong> <strong>in</strong>tegrase is not required for<br />
recomb<strong>in</strong>ation between tandem <strong>in</strong>sertions. Studies of <strong>the</strong><br />
Sulfolobus plasmids pARN3 and pARN4 reveal stretches<br />
of nucleotide identity, which might have been generated<br />
by tandem <strong>in</strong>sertions, even though <strong>the</strong>se plasmids carry<br />
non-<strong>in</strong>activatable <strong>in</strong>tegrases (Greve et al., 2004).<br />
The <strong>in</strong>herent <strong>in</strong>stability of a tandem <strong>in</strong>sertion makes it<br />
difficult, if not impossible, to detect <strong>in</strong> nature. However, a<br />
concatamer of <strong>in</strong>serted viral genomes, similar to <strong>the</strong> one<br />
proposed <strong>in</strong> our model, was recently discovered <strong>in</strong> <strong>the</strong><br />
chromosome of Methanococcus voltae A3. There, <strong>the</strong> two<br />
viral genomes <strong>in</strong>tegrated <strong>in</strong>to <strong>the</strong> same tRNA gene are<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
very different, prevent<strong>in</strong>g homologous recomb<strong>in</strong>ation,<br />
thus ‘trapp<strong>in</strong>g’ <strong>the</strong> viral concatamer <strong>in</strong> <strong>the</strong> host chromosome<br />
(Krupovic and Bamford, 2008). The attP sites of<br />
SSV2 and SSV7 as well as SSV5 and SSV6 match <strong>the</strong><br />
same tRNA <strong>in</strong> S. solfataricus P2 (Table 1), mak<strong>in</strong>g it likely<br />
that fuselloviruses are also able to <strong>in</strong>tegrate <strong>in</strong>to <strong>the</strong> same<br />
tRNA, form<strong>in</strong>g concatamers, which are unstable due to<br />
<strong>the</strong> similarity between <strong>the</strong> fuselloviruses. Moreover, it was<br />
shown that SSVk1 is able to <strong>in</strong>tegrate <strong>in</strong>to several different<br />
sites <strong>in</strong> <strong>the</strong> host genome (Wiedenheft et al., 2004),<br />
<strong>in</strong>creas<strong>in</strong>g <strong>the</strong> likelihood of f<strong>in</strong>d<strong>in</strong>g a ‘partner’ for recomb<strong>in</strong>ation.<br />
F<strong>in</strong>ally, examples of related viruses <strong>in</strong>fect<strong>in</strong>g <strong>the</strong><br />
same host at <strong>the</strong> same time are known for Sulfolobales,<br />
such as AFV6, AFV7 and AFV8 <strong>in</strong> Acidianus convivator<br />
(Vestergaard et al., 2008b).<br />
If <strong>the</strong> ‘tandem <strong>in</strong>sertion’ model is correct, <strong>the</strong>n an evolutionary<br />
tree of an entire viral genome has no mean<strong>in</strong>g,<br />
nor would that from <strong>in</strong>dividual ‘core’ genes (s<strong>in</strong>ce two<br />
halves of <strong>the</strong> same gene might orig<strong>in</strong>ate from different<br />
‘parent’ viruses). One might <strong>in</strong>stead analyse genes,<br />
described <strong>in</strong> <strong>the</strong> previous section, that are not shared<br />
by all fuselloviruses, s<strong>in</strong>ce <strong>the</strong>se genes cannot serve<br />
as cross-over po<strong>in</strong>ts for homologous recomb<strong>in</strong>ation.<br />
Although for <strong>the</strong> moment, <strong>the</strong> data set is too small for a<br />
phylogenetic analysis based on <strong>the</strong>se genes, <strong>the</strong> presence<br />
or absence of certa<strong>in</strong> genes <strong>in</strong> a subset of <strong>the</strong><br />
viruses, has provided important clues to understand<strong>in</strong>g<br />
prote<strong>in</strong> functions <strong>in</strong> <strong>the</strong> fuselloviruses, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> putative<br />
filament prote<strong>in</strong>s SSV1_C792 and SSV6_B1232.<br />
With <strong>the</strong> current understand<strong>in</strong>g of <strong>the</strong> <strong>CRISPR</strong> antiviral<br />
<strong>system</strong>, high nucleotide similarity between viruses<br />
should be disadvantageous, s<strong>in</strong>ce a s<strong>in</strong>gle spacer,<br />
match<strong>in</strong>g a conserved region, will provide a host with<br />
immunity to several virus stra<strong>in</strong>s (Lillestøl et al., 2009).<br />
Never<strong>the</strong>less, <strong>the</strong> puzzl<strong>in</strong>g fact rema<strong>in</strong>s that fuselloviruses<br />
do possess highly similar, sometimes identical,<br />
nucleotide regions, and it is possible that <strong>the</strong> <strong>in</strong>tegration<br />
and/or <strong>the</strong> frequent recomb<strong>in</strong>ation somehow provide <strong>the</strong><br />
fuselloviruses with <strong>the</strong> means to evade <strong>the</strong> <strong>CRISPR</strong><br />
<strong>system</strong> <strong>in</strong> <strong>the</strong>ir hosts.<br />
It has been proposed that <strong>the</strong>rmoacidophilic archaeal<br />
viruses are highly mobile, even between distant hot<br />
spr<strong>in</strong>gs <strong>in</strong> <strong>the</strong> same geo<strong>the</strong>rmal area, and that different<br />
fuselloviruses cont<strong>in</strong>uously <strong>in</strong>fect a more-or-less stable<br />
population of host species (Snyder et al., 2007). The high<br />
nucleotide similarity we have found, even between<br />
fuselloviruses isolated on different cont<strong>in</strong>ents, seems to<br />
confirm that <strong>the</strong>y do manage to exchange genetic material<br />
over <strong>the</strong> <strong>in</strong>tercont<strong>in</strong>ental distances that separate some of<br />
<strong>the</strong> geo<strong>the</strong>rmal ‘islands’ <strong>in</strong> <strong>the</strong> cold ‘ocean’.<br />
Experimental procedures<br />
Sulfolobus and Acidianus medium<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 11<br />
Fig. 5. Proposed model for recomb<strong>in</strong>ation<br />
between <strong>in</strong>tegrated fuselloviruses.<br />
A. The first fusellovirus (SSVa) <strong>in</strong>fects <strong>the</strong><br />
host, and <strong>in</strong>tegrates <strong>in</strong>to <strong>the</strong> chromosome.<br />
B. The second fusellovirus (SSVb) <strong>in</strong>fects <strong>the</strong><br />
host, and <strong>in</strong>tegrates <strong>in</strong>to <strong>the</strong> same tRNA as<br />
SSVa.<br />
C. The ‘tandem <strong>in</strong>tegration’ of SSVa and<br />
SSVb. The dashed arrows <strong>in</strong>dicate examples<br />
of homologous recomb<strong>in</strong>ation sites.<br />
D. Examples of ‘offspr<strong>in</strong>g’ cccDNA<br />
fuselloviruses from <strong>the</strong> recomb<strong>in</strong>ation of SSVa<br />
and SSVb.<br />
Z medium: 25 mM (NH4)2SO4, 3 mM K2SO4, 1.5 mM KCl,<br />
20 mM glyc<strong>in</strong>e, 4.0 mM MnCl2, 10.4 mM Na2B4O7, 0.38 mM<br />
ZnSO4, 0.13 mM CuSO4, 62 nM Na2MoO4, 59 nM VOSO4,<br />
18 nM CoSO4, 19 nM NiSO4, 0.1 mM HCl, 1 mM MgCl2,<br />
0.3 mM Ca(NO3)2 adjusted to pH 3.5 with H2SO4. T medium:<br />
Identical to Z medium, but with 0.2% Tryptone added. ST<br />
medium: Identical to T medium, but with small amounts of<br />
elemental sulphur added.
12 P. Redder et al.<br />
Isolation and purification of hosts and viruses<br />
Samples were collected from <strong>the</strong> Hveragerdi hot-spr<strong>in</strong>g area<br />
<strong>in</strong> south-western Iceland and 1 ml was used to establish an<br />
enrichment culture, by <strong>in</strong>cubat<strong>in</strong>g <strong>in</strong> 50 ml ST medium for<br />
9 days at 80°C, after which 1 ml was of <strong>the</strong> enrichment was<br />
transferred to fresh ST medium and <strong>in</strong>cubated for a fur<strong>the</strong>r<br />
4 days. Four millilitres of <strong>the</strong> enrichment (designated G4ST)<br />
was <strong>the</strong>n centrifuged at 4000 r.p.m. for 20 m<strong>in</strong> (Jouan S40<br />
rotor) to remove cells, whereupon <strong>the</strong> supernatant was spun<br />
fur<strong>the</strong>r at 38 000 r.p.m. for 3 h to pellet virions (Beckman<br />
SW60 rotor). F<strong>in</strong>ally, <strong>the</strong> pellet was resuspended <strong>in</strong> 50 ml of<br />
<strong>the</strong> supernatant. The resuspension was <strong>the</strong>n exam<strong>in</strong>ed by<br />
electron microscopy, and several different morphotypes of<br />
virus-like particles were <strong>in</strong> evidence. Among <strong>the</strong>se was a<br />
group of fusellovirus-like particles, but with different filament<br />
structures at <strong>the</strong> end, and a large diversity <strong>in</strong> <strong>the</strong>ir morphotypes,<br />
rang<strong>in</strong>g from sausage-shaped to an almost sp<strong>in</strong>dlelike<br />
pear-shape (Fig. 1).<br />
To isolate s<strong>in</strong>gle host–virus <strong>system</strong>s, G4ST was spread on<br />
a plate conta<strong>in</strong><strong>in</strong>g ST medium and solidified with Gel-rite<br />
(Sigma-Aldrich, St Louis, USA). After 10 days of <strong>in</strong>cubation<br />
at 80°C, 30 colonies of representative sizes, shapes and<br />
colours were transferred to 5 ml liquid ST medium and <strong>in</strong>cubated<br />
with vigorous shak<strong>in</strong>g for 4 days. Each of <strong>the</strong> grow<strong>in</strong>g<br />
stra<strong>in</strong>s was exam<strong>in</strong>ed for virus <strong>in</strong> <strong>the</strong> electron microscope,<br />
and <strong>the</strong> SSV6 and SSV7 were detected <strong>in</strong> <strong>the</strong> supernatant of<br />
stra<strong>in</strong> G4ST-T-11 and G4T-1 respectively. The 16S rRNA<br />
genes of G4T-1 and G4ST-T-11 were amplified us<strong>in</strong>g <strong>the</strong><br />
primers 8aF: TCYGGTTGATCCTGCC and 1512uR: ACG<br />
GHTACCTTGTTACGACTT (Accession number FJ870913<br />
for G4ST-T-11 and FJ870914 for G4T-1).<br />
SSV5 was present <strong>in</strong> HVE14, an enrichment culture, established<br />
from a natural sample collected near <strong>the</strong> G4 site, but<br />
10 years previously (Zillig et al., 1996). It was propagated <strong>in</strong><br />
S. solfataricus P2, by mix<strong>in</strong>g a small amount of HVE14 with a<br />
well-grown S. solfataricus P2 culture (1:1000), which was<br />
<strong>the</strong>n harvested and used for DNA isolation of extrachromosomal<br />
elements us<strong>in</strong>g plasmid m<strong>in</strong>iprep kit from Qiagen.<br />
Acidianus brierleyi were cultured at 70°C <strong>in</strong> ST medium and<br />
ASV1 was recovered from <strong>the</strong> supernatant by ultracentrifugation<br />
(38 000 r.p.m. for 3 h <strong>in</strong> a Beckman SW41 rotor).<br />
Electron microscopy<br />
Ten microlitres of <strong>the</strong> samples was deposited on a carbon<br />
and formvar coated grid (Ted Pella, Redd<strong>in</strong>g, CA, USA) and<br />
left for 2 m<strong>in</strong> before remov<strong>in</strong>g excess fluid. Ten microlitres of<br />
2% Uranyl-acetate or phosphotunstenate (Sigma-Aldrich)<br />
was allowed to sta<strong>in</strong> <strong>the</strong> samples negatively for 10 s. Images<br />
were taken on a JEOL1200EXII microscope with an 80 kV<br />
beam, us<strong>in</strong>g a CCD camera.<br />
DNA isolation and sequenc<strong>in</strong>g<br />
Six litres of G4ST-T-11 was grown <strong>in</strong> a fermentor, and after<br />
remov<strong>in</strong>g cells by centrifug<strong>in</strong>g twice at 4000 r.p.m. for 20 m<strong>in</strong><br />
(Sorvall GS-3 rotor), <strong>the</strong> virions <strong>in</strong> <strong>the</strong> supernatant were concentrated<br />
us<strong>in</strong>g a Sartorius Vivaflow 200 filter cartridge (Sartorius,<br />
Goett<strong>in</strong>gen Germany). The result<strong>in</strong>g 15 ml was fur<strong>the</strong>r<br />
concentrated by sp<strong>in</strong>n<strong>in</strong>g at 38 000 r.p.m. for 3 h us<strong>in</strong>g a<br />
SW41 Beckman rotor, and f<strong>in</strong>ally <strong>the</strong> virions were treated with<br />
Protease K and <strong>the</strong> DNA was extracted with Phenol, Phenol/<br />
Chloroform and Chloroform extraction. The SSV6 DNA was<br />
<strong>the</strong>n treated as described below for SSV7.<br />
In order to sequence SSV7, 5 ml of an exponential G4T-1culture<br />
was pelleted by centrifugation, and resuspended <strong>in</strong> Z<br />
medium. The SSV7 production was <strong>in</strong>duced by 50 J cm -2 UV<br />
radiation (254 nm) under constant mild agitation, and <strong>the</strong> cells<br />
were <strong>the</strong>n transferred to 45 ml T medium for over-night <strong>in</strong>cubation.<br />
Five millilitres was used for a m<strong>in</strong>iprep (QIAprep Sp<strong>in</strong><br />
M<strong>in</strong>iprep Kit, QIAGEN SA, Courtaboeuf, France), which was<br />
used for amplification and subsequent library construction<br />
based on <strong>the</strong> L<strong>in</strong>ker Amplified Shotgun Library method described<br />
at http://www.sci.sdsu.edu/PHAGE/LASL/<strong>in</strong>dex.htm.<br />
Shot-gun library construction of SSV5 and SSV6, as well<br />
as ASV1-conta<strong>in</strong><strong>in</strong>g A. brierleyi total DNA, was performed<br />
as described previously us<strong>in</strong>g SmaI digested pUC18 as<br />
clon<strong>in</strong>g vector (Peng, 2008). Plasmid DNA of clones, from<br />
all four libraries, were purified us<strong>in</strong>g a Model 8000 Biorobot<br />
(Qiagen, Westburg, Germany) and sequenced <strong>in</strong><br />
MegaBACE 1000 Sequenators (Amersham Biotech, Amersham,<br />
UK). Sequences were assembled us<strong>in</strong>g Sequencher<br />
4.5 (http://www.genecodes.com). Genome annotations and<br />
comparisons were done us<strong>in</strong>g <strong>the</strong> MUTAGEN software<br />
(Brügger et al., 2003) with a m<strong>in</strong>imum ORF-length set to<br />
50 aa and allow<strong>in</strong>g AUG, GUG and UUG as possible start<br />
codons. Accession numbers are EU030939, FJ870915,<br />
FJ870916 and FJ870917 for SSV5, SSV6, SSV7 and ASV1<br />
respectively.<br />
<strong>CRISPR</strong> spacer analysis<br />
To obta<strong>in</strong> a list of spacer sequences from Sulfolobales, <strong>the</strong><br />
follow<strong>in</strong>g partial or full genomes were used: S. solfataricus<br />
P2, S. tokodaii 7, S. acidocaldarius DSM 639, Metallosphaera<br />
sedula DSM5348 from GenBank (http://<br />
www.ncbi.nlm.nih.gov/Genbank/), Sulfolobus islandicus<br />
stra<strong>in</strong>s LD85, YG5714, YN1551, M164 and U328 from JGI<br />
(http://www.jgi.doe.gov/genome-projects/), and S. islandicus<br />
stra<strong>in</strong>s HVE10/4 and REY15A and Acidianus brierleyi (K.<br />
Brügger and Q. She, unpubl. data). <strong>CRISPR</strong>s were identified<br />
us<strong>in</strong>g publicly available software (Edgar, 2007; Bland et al.,<br />
2007). Spacer sequences from each repeat-cluster were<br />
aligned (Sæbø et al., 2005) aga<strong>in</strong>st <strong>the</strong> fuselloviral genomes<br />
at a nucleotide level (Shah et al., 2009). Additionally, spacers<br />
were aligned aga<strong>in</strong>st am<strong>in</strong>o acid sequences of annotated<br />
ORFs of <strong>the</strong> Fuselloviruses, at an am<strong>in</strong>o acid level (Vestergaard<br />
et al., 2008a; Shah et al., 2009). Significance cut-offs<br />
were determ<strong>in</strong>ed for both alignment types by us<strong>in</strong>g <strong>the</strong><br />
genome sequence of Saccharomyces cerevisiae as a negative<br />
control.<br />
Acknowledgements<br />
P.R. was funded by grant VIRAR (NT05-2_41674) from <strong>the</strong><br />
Agence Nationale de la Recherche, France. The research <strong>in</strong><br />
Copenhagen was supported by grants from <strong>the</strong> Grundforskn<strong>in</strong>gsfond<br />
and <strong>the</strong> Reseach Council for Natural Sciences. We<br />
would also like to thank <strong>the</strong> Electron Microscopy Platform<br />
at Institut Pasteur for helpful advice and use of <strong>the</strong>ir<br />
JEOL1200EXII microscope.<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
References<br />
Andersson, A.F., and Banfield, J.F. (2008) Virus population<br />
dynamics and acquired virus resistance <strong>in</strong> natural microbial<br />
communities. Science 320: 1047–1050.<br />
Arnold, H.P., She, Q., Phan, H., Stedman, K., Prangishvili,<br />
D., Holz, I., et al. (1999) The genetic element pSSVx of<br />
<strong>the</strong> extremely <strong>the</strong>rmophilic crenarchaeon Sulfolobus is a<br />
hybrid between a plasmid and a virus. Mol Microbiol 34:<br />
217–226.<br />
Bath, C., and Dyall-Smith, M.L. (1998) His1, an archaeal<br />
virus of <strong>the</strong> Fuselloviridae family that <strong>in</strong>fects Haloarcula<br />
hispanica. J Virol 72: 9392–9395.<br />
Bath, C., Cukalac, T., Porter, K., and Dyall-Smith, M.L. (2006)<br />
His1 and His2 are distantly related, sp<strong>in</strong>dle-shaped haloviruses<br />
belong<strong>in</strong>g to <strong>the</strong> novel virus group, Salterprovirus.<br />
Virology 350: 228–239.<br />
Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />
Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />
of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159:<br />
358–366.<br />
Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K.,<br />
Kyrpides, N.C., and Hugenholtz, P. (2007) <strong>CRISPR</strong> Recognition<br />
Tool (CRT): a tool for automatic detection of<br />
clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats. BMC<br />
Bio<strong>in</strong>formatics 8: 209–217.<br />
Brügger, K., Redder, P., and Skovgaard, M. (2003)<br />
MUTAGEN: multi-user tool for annotat<strong>in</strong>g genomes. Bio<strong>in</strong>formatics<br />
19: 2480–2481.<br />
Clore, A.J., and Stedman, K.M. (2007) The SSV1 viral <strong>in</strong>tegrase<br />
is not essential. Virology 361: 103–111.<br />
Edgar, R.C. (2007) PILER-CR: fast and accurate identification<br />
of <strong>CRISPR</strong> repeats. BMC Bio<strong>in</strong>formatics 8: 18–24.<br />
Fröls, S., Gordon, P.M., Panlilio, M.A., Schleper, C., and<br />
Sensen, C.W. (2007) Elucidat<strong>in</strong>g <strong>the</strong> transcription cycle of<br />
<strong>the</strong> UV-<strong>in</strong>ducible hyper<strong>the</strong>rmophilic archaeal virus SSV1 by<br />
DNA microarrays. Virology 365: 48–59.<br />
Gesl<strong>in</strong>, C., Le Romancer, M., Erauso, G., Gaillard, M., Perrot,<br />
G., and Prieur, D. (2003) PAV1, <strong>the</strong> first virus-like particle<br />
isolated from a hyper<strong>the</strong>rmophilic euryarchaeote, ‘Pyrococcus<br />
abyssi’. J Bacteriol 185: 3888–3894.<br />
Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />
R.A. (2004) Genomic comparison of archaeal conjugative<br />
plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />
Grogan, D.W. (2009) Homologous recomb<strong>in</strong>ation <strong>in</strong> Sulfolobus<br />
acidocaldarius: genetic assays and functional properties.<br />
Biochem Soc Trans 37 (Pt 1): 88–91.<br />
Guixa-Boixareu, N., Calderon-Paz, J.I., Heldal, M., Bratbak,<br />
G., and Pedros-Alio, C. (1996) Viral lysis and bacterivory<br />
as prokaryotic loss factors along a sal<strong>in</strong>ity gradient. Aquat<br />
Microb Ecol 11: 215–227.<br />
Här<strong>in</strong>g, M., Rachel, R., Peng, X., Garrett, R.A., and Prangishvili,<br />
D. (2005) Viral diversity <strong>in</strong> hot spr<strong>in</strong>gs of Pozzuoli,<br />
Italy, and characterization of a unique archaeal virus, Acidianus<br />
bottle-shaped virus, from a new family, <strong>the</strong> Ampullaviridae.<br />
J Virol 79: 9904–9911.<br />
Held, N.L., and Whitaker, R.J. (2009) Viral biogeography<br />
revealed by signatures <strong>in</strong> Sulfolobus islandicus genomes.<br />
Environ Microbiol 11: 457–466.<br />
Hendrix, R.W., Smith, M.C.M., Burns, R.N., Ford, M.E., and<br />
Hatfull, G.F. (1999) Evolutionary relationships among<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology<br />
Fuselloviral diversity 13<br />
diverse bacteriophages and prophages: all <strong>the</strong> world’s a<br />
phage. Proc Natl Acad Sci USA 96: 2192–2197.<br />
Kelley, L.A., and Sternberg, M.J.E. (2009) Prote<strong>in</strong> structure<br />
prediction on <strong>the</strong> web: a case study us<strong>in</strong>g <strong>the</strong> Phyre server.<br />
Nature Protocols 4: 363–371.<br />
Koon<strong>in</strong>, E.V. (1992) Archaebacterial virus SSV1 encodes a<br />
putative DnaA-like prote<strong>in</strong>. Nucleic Acids Res 20: 1143.<br />
Kraft, P., Kümmel, D., Oeck<strong>in</strong>ghaus, A., Gauss, G.H.,<br />
Wiedenheft, B., Young, M., and Lawrence, C.M. (2004a)<br />
Structure of D-63 from Sulfolobus sp<strong>in</strong>dle-shaped virus 1:<br />
surface properties of <strong>the</strong> dimeric four-helix bundle suggest<br />
an adaptor prote<strong>in</strong> function. J Virol 78: 7438–7442.<br />
Kraft, P., Oeck<strong>in</strong>ghaus, A., Kümmel, D., Gauss, G.H.,<br />
Gilmore, J., Wiedenheft, B., et al. (2004b) Crystal structure<br />
of F-93 from Sulfolobus sp<strong>in</strong>dle-shaped virus 1, a w<strong>in</strong>gedhelix<br />
DNA b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>. J Virol 78: 11544–11550.<br />
Krupovic, M., and Bamford, D.H. (2008) <strong>Archaea</strong>l proviruses<br />
TKV4 and MVV extend <strong>the</strong> PRD1-adenovirus l<strong>in</strong>eage to<br />
<strong>the</strong> phylum Euryarchaeota. Virology 375: 292–300.<br />
Letzelter, C., Duguet, M., and Serre, M.C. (2004) Mutational<br />
analysis of <strong>the</strong> archaeal tyros<strong>in</strong>e recomb<strong>in</strong>ase SSV1 <strong>in</strong>tegrase<br />
suggests a mechanism of DNA cleavage <strong>in</strong> trans.<br />
J Biol Chem 279: 28936–28944.<br />
Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H.,<br />
Christiansen, J., and Garrett, R.A. (2009) <strong>CRISPR</strong> families<br />
of<strong>the</strong>crenarchaealgenusSulfolobus:bidirectionaltranscription<br />
and dynamic properties. Mol Microbiol 72: 259–272.<br />
Marraff<strong>in</strong>i, L.A., and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />
limits horizontal gene transfer <strong>in</strong> Staphylococci by<br />
target<strong>in</strong>g DNA. Science 322: 1843–1845.<br />
Mart<strong>in</strong>, A., Yeats, S., Janekovic, D., Reiter, W.-D., Aicher, W.,<br />
and Zillig, W. (1984) SAV1, a temperate u.v.-<strong>in</strong>ducible DNA<br />
virus-like particle from archaebacterium Sulfolobus acidocaldarius<br />
isolate B12. EMBO J 3: 2165–2168.<br />
Menon, S.K., Maaty, W.S., Corn, G.J., Kwok, S.C., Eilers,<br />
B.J., Kraft, P., et al. (2008) Cyste<strong>in</strong>e usage <strong>in</strong> Sulfolobus<br />
sp<strong>in</strong>dle-shaped virus 1 and extension to hyper<strong>the</strong>rmophilic<br />
viruses <strong>in</strong> general. Virology 376: 270–278.<br />
Muskhelishvili, G. (1994) The archaeal SSV <strong>in</strong>tegrase promotes<br />
<strong>in</strong>termolecular excisive recomb<strong>in</strong>ation <strong>in</strong> vitro. Syst<br />
Appl Microbiol 16: 605–608.<br />
Muskhelishvili, G., Palm, P., and Zillig, W. (1993) SSV1encoded<br />
site-specific recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus<br />
shibatae. Mol Gen Genet 237: 334–342.<br />
Oren, A., Bratbak, G., and Hendal, M. (1997) Occurrence of<br />
virus-like particles <strong>in</strong> <strong>the</strong> Dead Sea. Extremophiles 1: 143–<br />
149.<br />
Palm, P., Schleper, C., Grampp, B., Yeats, S., McWilliam, P.,<br />
Reiter, W.D., and Zillig, W. (1991) Complete nucleotide<br />
sequence of <strong>the</strong> virus SSV1 of <strong>the</strong> archaebacterium Sulfolobus<br />
shibatae. Virology 185: 242–250.<br />
Peng, X. (2008) Evidence for <strong>the</strong> horizontal transfer of an<br />
<strong>in</strong>tegrase gene from a fusellovirus to a pRN-like plasmid<br />
with<strong>in</strong> a s<strong>in</strong>gle stra<strong>in</strong> of Sulfolobus and <strong>the</strong> implications for<br />
plasmid survival. Microbiol 154 (Pt 2): 383–391.<br />
Porter, K., Russ, B.E., and Dyall-Smith, M.L. (2007) Virus–<br />
host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10:<br />
418–424.<br />
Prangishvili, D. (2003) Evolutionary <strong>in</strong>sights from studies on<br />
viruses of hyper<strong>the</strong>rmophilic archaea. Res Microbiol 154:<br />
289–294.
14 P. Redder et al.<br />
Prangishvili, D., Garrett, R.A., and Koon<strong>in</strong>, E.V. (2006a)<br />
Evolutionary genomics of archaeal viruses: unique viral<br />
genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–67.<br />
Prangishvili, D., Vestergaard, G., Här<strong>in</strong>g, M., Aramayo, R.,<br />
Basta, T., Rachel, R., and Garrett, R.A. (2006b) Structural<br />
and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />
virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />
cycle. J Mol Biol 359: 1203–1216.<br />
Rachel, R., Bettstetter, M., Hedlund, B.P., Här<strong>in</strong>g, M.,<br />
Kessler, A., Stetter, K.O., and Prangishvili, D. (2002)<br />
Remarkable morphological diversity of viruses and viruslike<br />
particles <strong>in</strong> hot terrestrial environments. Arch Virol 147:<br />
2419–2429.<br />
Reiter, W.-D., Palm, P., Henschen, A., Lottspeich, F., Zillig,<br />
W., and Grampp, B. (1987a) Identification and characterization<br />
of <strong>the</strong> genes encod<strong>in</strong>g three structural prote<strong>in</strong>s of<br />
<strong>the</strong> Sulfolobus virus-like particle SSV1. Mol Gen Genet<br />
206: 144–153.<br />
Reiter, W.D., Palm, P., Yeats, S., and Zillig, W. (1987b)<br />
Gene expression <strong>in</strong> archaebacteria: physical mapp<strong>in</strong>g<br />
of constitutive and UV-<strong>in</strong>ducible transcripts from <strong>the</strong><br />
Sulfolobus virus-like particle SSV1. Mol Gen Genet 209:<br />
270–275.<br />
Rice, G., Stedman, K., Snyder, J., Wiedenheft, B., Willits, D.,<br />
et al. (2001) Viruses from extreme <strong>the</strong>rmal environments.<br />
Proc Natl Acad Sci USA 98: 13341–13345.<br />
Sæbø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K.,<br />
and Rognes, T. (2005) PARALIGN: rapid and sensitive<br />
sequence similarity searches powered by parallel comput<strong>in</strong>g<br />
technology. Nucleic Acids Res 33: 535–539.<br />
Schleper, C., Kubo, K., and Zillig, W. (1992) The particle<br />
SSV1 from <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon<br />
Sulfolobus is a virus: demonstration of <strong>in</strong>fectivity and of<br />
transfection with viral DNA. Proc Natl Acad Sci USA 89:<br />
7645–7649.<br />
Serre, M.-C., Letzelter, C., Garel, J.-R., and Duguet, M.<br />
(2002) Cleavage properties of an archaeal site-specific<br />
recomb<strong>in</strong>ase, <strong>the</strong> SSV1 <strong>in</strong>tegrase. J Biol Chem 277:<br />
16758–16767.<br />
Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />
of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />
of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />
<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem Soc Trans 37: 23–<br />
28.<br />
Snyder, J.C., Wiedenheft, B., Lav<strong>in</strong>, M., Roberto, F.F.,<br />
Spuhler, J., Ortmann, A.C., et al. (2007) Virus movement<br />
ma<strong>in</strong>ta<strong>in</strong>s local virus population diversity. Proc Natl Acad<br />
Sci USA 104: 19102–19107.<br />
Stedman, K.M., She, Q., Phan, H., Arnold, H.P., Holz, I.,<br />
Garrett, R.A., and Zillig, W. (2003) Relationships between<br />
fuselloviruses <strong>in</strong>fect<strong>in</strong>g <strong>the</strong> extremely <strong>the</strong>rmophilic<br />
archaeon Sulfolobus: SSV1 and SSV2. Res Microbiol 154:<br />
295–302.<br />
Ste<strong>in</strong>metz, N.F., Bize, A., K<strong>in</strong>dlay, K.C., Lomonosoff, G.P.,<br />
Manchester, M., Evans, D.J., and Prangishvili, D. (2008)<br />
Site-specific and spatially controlled addressability of a<br />
new viral nanobuild<strong>in</strong>g block: Sulfolobus islandicus rodshaped<br />
virus 2. Adv Funct Mater 18: 1–9.<br />
Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />
M., Phan, H., et al. (2008a) SRV, a new rudiviral isolate<br />
from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />
with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>.<br />
J Bacteriol 190: 6837–6845.<br />
Vestergaard, G., Aramayo, R., Basta, T., Här<strong>in</strong>g, M., Peng,<br />
X., Brügger, K., et al. (2008b) Structure of <strong>the</strong> acidianus<br />
filamentous virus 3 and comparative genomics of related<br />
archaeal lipothrixviruses. J Virol 82: 371–381.<br />
Wiedenheft, B., Stedman, K., Roberto, F., Willits, D., Gleske,<br />
A.K., Zoeller, L., et al. (2004) Comparative genomic analysis<br />
of hyper<strong>the</strong>rmophilic archaeal Fuselloviridae viruses.<br />
J Virol 78: 1954–1961.<br />
Xiang, X., Chen, L., Huang, X., Luo, Y., She, Q., and Huang,<br />
L. (2005) Sulfolobus tengchongensis sp<strong>in</strong>dle-shaped virus<br />
STSV1: virus–host <strong>in</strong>teractions and genomic features.<br />
J Virol 79: 8677–8686.<br />
Yeats, S., McWilliam, P., and Zillig, W. (1982) A plasmid <strong>in</strong> <strong>the</strong><br />
archaebacterium Sulfolobus acidocaldarius. EMBO J 1:<br />
1035–1038.<br />
Zillig, W., Prangishvilli, D., Schleper, C., Elfer<strong>in</strong>k, M., Holz, I.,<br />
Albers, S., et al. (1996) Viruses, plasmids and o<strong>the</strong>r<br />
genetic elements of <strong>the</strong>rmophilic and hyper<strong>the</strong>rmophilic<br />
<strong>Archaea</strong>. FEMS Microbiol Rev 18: 225–236.<br />
©2009SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology
Environmental Microbiology (2010) 12(11), 2918–2930 doi:10.1111/j.1462-2920.2010.02266.x<br />
Metagenomic analyses of novel viruses and plasmids<br />
from a cultured environmental sample of<br />
hyper<strong>the</strong>rmophilic neutrophilesemi_2266 2918..2930<br />
Roger A. Garrett, 1 * David Prangishvili, 2<br />
Shiraz A. Shah, 1 Monika Reuter, 2,3 Karl O. Stetter 3<br />
and Xu Peng 1<br />
1 <strong>Archaea</strong> Centre, Department of Biology, Copenhagen<br />
University, Ole Maaløes Vej 5, DK-2200 Copenhagen N,<br />
Denmark.<br />
2 Institut Pasteur, Molecular Biology of <strong>the</strong> Gene <strong>in</strong><br />
Extremophiles Unit, rue Dr. Roux 25, 75724 Paris<br />
Cedex 15, France.<br />
3 Department of Microbiology, <strong>Archaea</strong> Centre, University<br />
of Regensburg, D-93053 Regensburg, Germany.<br />
Summary<br />
Two novel viral genomes and four plasmids were<br />
assembled from an environmental sample collected<br />
from a hot spr<strong>in</strong>g at Yellowstone National Park, USA,<br />
and ma<strong>in</strong>ta<strong>in</strong>ed anaerobically <strong>in</strong> a bioreactor at 85°C<br />
and pH 6. The double-stranded DNA viral genomes<br />
are l<strong>in</strong>ear (22.7 kb) and circular (17.7 kb), and derive<br />
apparently from archaeal viruses HAV1 and HAV2.<br />
Genomic DNA was obta<strong>in</strong>ed from samples enriched <strong>in</strong><br />
filamentous and tadpole-shaped virus-like particles<br />
respectively. They yielded few significant matches <strong>in</strong><br />
public sequence databases re<strong>in</strong>forc<strong>in</strong>g, fur<strong>the</strong>r, <strong>the</strong><br />
wide diversity of archaeal viruses. Several variants of<br />
HAV1 exhibit major genomic alterations, presumed to<br />
arise from viral adaptation to different hosts. They<br />
<strong>in</strong>clude <strong>in</strong>sertions up to 350 bp, deletions up to 1.5 kb,<br />
and genes with extensively altered sequences. Some<br />
result from recomb<strong>in</strong>ation events occurr<strong>in</strong>g at low<br />
complexity direct repeats distributed along <strong>the</strong><br />
genome. In addition, a 33.8 kb archaeal plasmid pHA1<br />
was characterized, encod<strong>in</strong>g a possible conjugative<br />
apparatus, as well as three cryptic plasmids of <strong>the</strong>rmophilic<br />
bacterial orig<strong>in</strong>, pHB1 of 2.1 kb and two<br />
closely related variants pHB2a and pHB2b, of 5.2 and<br />
4.8 kb respectively. Strategies are considered for<br />
assembl<strong>in</strong>g genomes of smaller genetic elements<br />
from complex environmental samples, and for estab-<br />
Received 10 February, 2010; accepted 20 April, 2010. *For correspondence:<br />
E-mail garrett@bio.ku.dk; Tel. (+45) 35322010; Fax (+45)<br />
35322128.<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd<br />
lish<strong>in</strong>g possible host identities on <strong>the</strong> basis of<br />
sequence similarity to host <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s.<br />
Introduction<br />
<strong>Archaea</strong>l viruses exhibit a wide variety of morphotypes<br />
and genomic properties. They have been isolated and<br />
characterized primarily from terrestial acidic hot spr<strong>in</strong>gs or<br />
hypersal<strong>in</strong>e lakes, <strong>in</strong> many different geographical locations.<br />
Several viruses from terrestial acidic hot spr<strong>in</strong>gs<br />
have now been classified <strong>in</strong>to new viral families while<br />
o<strong>the</strong>rs, toge<strong>the</strong>r with a few haloarchaeal viruses from <strong>the</strong><br />
euryarchaeal k<strong>in</strong>gdom, rema<strong>in</strong> unclassified (Prangishvili<br />
et al., 2006a; Porter et al., 2007; Lawrence et al., 2009).<br />
Although some crenarchaeal and euryarchaeal virions<br />
share similar morphotypes, <strong>the</strong>ir genomic properties show<br />
little <strong>in</strong> common (Ortmann et al., 2006; Porter et al., 2007)<br />
nor, with <strong>the</strong> exception of a few head-tail euryarchaeal<br />
viruses, do <strong>the</strong>y share many homologous genes with<br />
ei<strong>the</strong>r bacterial or eukaryal viruses (Prangishvili et al.,<br />
2006b). Despite <strong>the</strong> broad diversity of characterized<br />
archaeal viruses, as a group <strong>the</strong>y probably constitute a<br />
biased sample because most of <strong>the</strong>m exclusively <strong>in</strong>fect<br />
<strong>the</strong>rmoacidophilic members of <strong>the</strong> order Sulfolobales or a<br />
few haloarchaeal stra<strong>in</strong>s.<br />
Few studies, to date, have addressed <strong>the</strong> relative abundance<br />
of different viral morphotypes <strong>in</strong> archaea-rich environments.<br />
Electron microscopy studies of samples from<br />
terrestial hot spr<strong>in</strong>gs suggest that sp<strong>in</strong>dles, filaments,<br />
rods and spheres predom<strong>in</strong>ate (Rachel et al., 2002; Bize<br />
et al., 2008), while o<strong>the</strong>r morphotypes are much less<br />
common. In hypersal<strong>in</strong>e environments sp<strong>in</strong>dle-shaped<br />
and spherical forms predom<strong>in</strong>ate (Oren et al., 1997; Diez<br />
et al., 2000; Porter et al., 2007) while head-tail virus-like<br />
particles (VLPs) are quite common and <strong>the</strong>ir proviruses<br />
have been detected <strong>in</strong> some sequenced genomes of haloand<br />
methanoarchaea (Porter et al., 2007; Krupovič and<br />
Bamford, 2008; Krupovič et al., 2010).<br />
Only four crenarchaeal viruses from extreme geo<strong>the</strong>rmal<br />
environments at neutral pH values have been fully<br />
characterized to date, <strong>the</strong> rod-shaped Thermoproteus<br />
tenax lipothrixvirus, TTV1 (Janekovic et al., 1983), Pyrobaculum<br />
spherical virus, PSV (Här<strong>in</strong>g et al., 2004), <strong>the</strong><br />
closely related T. tenax spherical virus 1, TTSV1 (Ahn
et al., 2006), and <strong>the</strong> Aeropyrum pernix bacilliform virus 1,<br />
APBV1 (Mochizuki et al., 2010). However, electron<br />
microscopy studies of an enrichment culture from a<br />
sample collected from Obsidian Pool, Yellowstone<br />
National Park, USA, ma<strong>in</strong>ta<strong>in</strong>ed at 85°C and pH 6 under<br />
anaerobic conditions, revealed five morphologically<br />
diverse VLPs (fig. 1 <strong>in</strong> Rachel et al., 2002), <strong>in</strong>clud<strong>in</strong>g<br />
spherical virions of <strong>the</strong> virus PSV which was characterized<br />
earlier (Här<strong>in</strong>g et al., 2004). The enrichment culture<br />
also carried a variety of genera, <strong>in</strong>clud<strong>in</strong>g crenarchaeal<br />
Thermofilum, Thermoproteus and Thermosphaera, euryarchaeal<br />
Archaeoglobus and <strong>the</strong> bacterial genera<br />
Thermus, Geo<strong>the</strong>rmobacterium and Thermodesulfobacterium<br />
(Rachel et al., 2002). The enrichment culture was<br />
ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> a bioreactor over a 2-year period and aliquots<br />
were extracted at regular <strong>in</strong>tervals over by this time<br />
and screened for VLPs by electron microscopy. The different<br />
VLP morphotypes observed <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> spherical<br />
PSV varied considerably <strong>in</strong> <strong>the</strong>ir relative yields over time<br />
(Fig. 1).<br />
In this study, we attempted to obta<strong>in</strong> genome sequences<br />
associated with <strong>the</strong> rema<strong>in</strong><strong>in</strong>g unidentified VLPs. To this<br />
end, <strong>the</strong> samples extracted from <strong>the</strong> bioreactor at different<br />
time <strong>in</strong>tervals were <strong>in</strong>vestigated, as well as mixtures of<br />
samples. A variety of approaches were used to generate<br />
clone libraries and to dist<strong>in</strong>guish viral from plasmid DNA,<br />
and circular from l<strong>in</strong>ear DNA genomes and, for <strong>the</strong> VLPs, to<br />
correlate genome-type with morphotype.<br />
S<strong>in</strong>ce attempts to f<strong>in</strong>d hosts for <strong>the</strong> VLPs were unsuccessful,<br />
we <strong>in</strong>vestigated potential hosts for <strong>the</strong> archaeal<br />
viruses and plasmids by match<strong>in</strong>g <strong>the</strong>ir genome<br />
sequences to spacer sequences of <strong>the</strong> chromosomal<br />
<strong>immune</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> (Van der Oost et al., 2009).<br />
These chromosomal spacers derive from <strong>in</strong>fect<strong>in</strong>g viruses<br />
or plasmids (Barrangou et al., 2007) and are present<br />
with<strong>in</strong> all <strong>the</strong> available sequenced genomes of <strong>the</strong>rmophilic<br />
neutrophiles. The spacers represent a history of<br />
<strong>in</strong>vad<strong>in</strong>g viruses and plasmids and a close sequence<br />
match implies that <strong>the</strong> host has been <strong>in</strong>fected by a similar<br />
virus or plasmid (Lillestøl et al., 2006; Andersson and<br />
Banfield, 2008; Shah et al., 2009).<br />
Results<br />
An enrichment culture established from a sample collected<br />
from Obsidian Pool at Yellowstone National Park,<br />
USA, was ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> a bioreactor at 85°C and pH 6.<br />
Virus-like particles were concentrated from supernatant<br />
aliquots taken from <strong>the</strong> bioreactor and subjected to CsCl<br />
density gradient ultracentrifugation. Initially, shot-gun<br />
clone libraries were prepared from a mixture of bioreactor<br />
samples (bioreactor-mix) (Fig. 1A) which were deprote<strong>in</strong>ized<br />
after density gradient centrifugation without any pretreatment<br />
but most clones were found to derive from<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2919<br />
contam<strong>in</strong>at<strong>in</strong>g chromosomal DNA fragments. Therefore,<br />
DNase I treatment was <strong>in</strong>troduced to remove chromosomal<br />
contam<strong>in</strong>ation before deprote<strong>in</strong>ization. Moreover,<br />
we exam<strong>in</strong>ed samples collected at different times over a<br />
2-year period, for which a given VLP type was dom<strong>in</strong>ant <strong>in</strong><br />
electron micrographs (Fig. 1B and C), <strong>in</strong> order to correlate<br />
viral genome types with morphotypes. Fur<strong>the</strong>rmore, strategies<br />
were developed for dist<strong>in</strong>guish<strong>in</strong>g viral from plasmid<br />
DNA, and l<strong>in</strong>ear from circular DNA genomes (Table 1; see<br />
also Experimental procedures).<br />
Five ma<strong>in</strong> libraries were prepared to generate viral DNA<br />
and plasmid clones (Table 1). These <strong>in</strong>clude <strong>the</strong> larger<br />
library of supernatant DNA from <strong>the</strong> mix of bioreactor<br />
samples collected at different time <strong>in</strong>tervals (4000<br />
sequences) (Fig. 1A), and an earlier library that was used<br />
to sequence <strong>the</strong> partially purified Pyrobaculum spherical<br />
virus PSV, isolated from <strong>the</strong> same bioreactor (Här<strong>in</strong>g<br />
et al., 2004). Moreover, samples enriched <strong>in</strong> two of <strong>the</strong><br />
novel VLPs were obta<strong>in</strong>ed (Fig. 1B and C) and used to<br />
generate clone libraries. Thus, a shot-gun filament library<br />
was prepared from two samples rich <strong>in</strong> short filamentous<br />
VLPs (Fig. 1B), and fur<strong>the</strong>r clone libraries were prepared<br />
from samples rich <strong>in</strong> tadpole-shaped particles (Fig. 1C)<br />
after select<strong>in</strong>g for (i) circular plasmids which were preferentially<br />
amplified (tadpole-1) and (ii) circular viral<br />
genomes after degrad<strong>in</strong>g chromosomal and plasmid DNA<br />
and <strong>the</strong>n deprote<strong>in</strong>iz<strong>in</strong>g virions and treat<strong>in</strong>g with circular<br />
DNA-safe nucleases (tadpole-2). We also screened,<br />
unsuccessfully, for RNA viral genomes by generat<strong>in</strong>g<br />
cDNA libraries (data not shown).<br />
Complete genomes from two putative archaeal viruses<br />
HAV1 and HAV2 were assembled, <strong>the</strong> former 22.7 kb and<br />
l<strong>in</strong>ear, and <strong>the</strong> latter 17.7 kb and circular, and, <strong>in</strong> addition,<br />
four plasmids were sequenced pHA1 – 33.8 kb, pHB1 –<br />
2.1 kb, and two variants of pHB2a and pHB2b of 4.8 kb<br />
and 5.4 kb respectively. The approximate percentages of<br />
clones from <strong>the</strong> five ma<strong>in</strong> libraries that were <strong>in</strong>corporated<br />
<strong>in</strong>to each assembled genetic element are given (Table 1),<br />
and <strong>the</strong> numbers are consistent with <strong>the</strong> strategy<br />
employed for dist<strong>in</strong>guish<strong>in</strong>g viral from plasmid genomes,<br />
except that <strong>the</strong> relatively high percentage of clones of<br />
plasmids pHB2a and pHB2b (20%), obta<strong>in</strong>ed from <strong>the</strong><br />
tadpole-2 library, probably reflects <strong>in</strong>complete DNase-1<br />
digestion of non-viral circular DNA (Table 1). The average<br />
genome coverage was about fivefold for each element,<br />
unless o<strong>the</strong>rwise stated, and all sequence ambiguities<br />
were resolved by primer walk<strong>in</strong>g on clones. The identities<br />
and general properties of <strong>the</strong> sequenced genetic elements<br />
are summarized <strong>in</strong> Table 2.<br />
Filamentous VLPs<br />
Two bioreactor samples rich <strong>in</strong> short filamentous VLPs<br />
(Fig. 1B) were pooled and treated with DNase I at 37°C<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2920 R. A. Garrett et al.<br />
A<br />
B C<br />
Fig. 1. Electron micrographs show<strong>in</strong>g VLP morphotypes observed <strong>in</strong> <strong>the</strong> analysed bioreactor culture.<br />
A. A mixture of all preparations of VLPs collected from <strong>the</strong> bioreactor.<br />
B. Preparation enriched <strong>in</strong> filamentous VLPs.<br />
C. Preparation enriched <strong>in</strong> tadpole-shaped VLPs.<br />
The size marker corresponds to 500 nm.<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
Table 1. Pre-treatment of viral samples before library construction and <strong>the</strong> approximate percentage of clone sequences assembled from each<br />
library for each genetic element.<br />
Clone libraries<br />
Treatment Bioreactor mix PSV Filament Tadpole-1 Tadpole-2<br />
Element Size (bp) i, iii i, iii i, iii, v i, iii, v i, ii, iii, iv, v<br />
HAV1 22 743 5 95<br />
HAV2 17 666 5 95<br />
pHA1 33 795 40 57 3<br />
pHB1 2 099 100<br />
pHB2a/2b 4 780/5 370 80 20<br />
Bioreactor supernatant extracts were subjected to <strong>the</strong> follow<strong>in</strong>g treatments: i. CsCl gradient centrifugation of virions. ii. DNase I treatment of <strong>the</strong><br />
virion band from CsCl gradients. iii. Deprote<strong>in</strong>ization with SDS and prote<strong>in</strong>ase K followed by phenol extraction of DNA. iv. Plasmid-safe DNase<br />
treatment of DNA. v. In vitro amplification of DNA.<br />
The total number of sequenced clones that were assembled <strong>in</strong>to <strong>the</strong> virus and plasmid genomes (prior to sequence polish<strong>in</strong>g) were HAV1 – 956,<br />
HAV2 – 195, pHA1 – 188, pHB1 – 49, pHB2a/2b which were co-assembled – 55.<br />
for 15 m<strong>in</strong>, to remove extraneous chromosomal and<br />
plasmid DNA before extract<strong>in</strong>g DNA from VLPs by phenol<br />
treatment. A clone library was generated and DNA<br />
sequenc<strong>in</strong>g yielded a non-circular contig of about 20 kb,<br />
consistent with a l<strong>in</strong>ear genome. S<strong>in</strong>ce term<strong>in</strong>al<br />
sequences are <strong>in</strong>variably absent from shot-gun clone<br />
libraries of l<strong>in</strong>ear genomes (e.g. Vestergaard et al.,<br />
2008a), libraries were produced us<strong>in</strong>g <strong>the</strong> L<strong>in</strong>ker Amplified<br />
Shotgun Library method (see Experimental procedures)<br />
which yielded a high sequence coverage of <strong>the</strong><br />
DNA term<strong>in</strong>i. The complete l<strong>in</strong>ear DNA genome consists<br />
of 22 743 bp with a 21 bp <strong>in</strong>verted term<strong>in</strong>al repeat (ITR) of<br />
sequence 5′-CGTCTCTCTGTGTGTATGGGA-3′. We<br />
<strong>in</strong>fer that both term<strong>in</strong>i are free, blunt and unmodified,<br />
because <strong>the</strong>y were efficiently ligated with <strong>the</strong> blunt end of<br />
<strong>the</strong> adaptor dur<strong>in</strong>g library construction (see Experimental<br />
procedures). S<strong>in</strong>ce only one major contig was assembled<br />
from <strong>the</strong> filament-library sequences, we <strong>in</strong>ferred that it<br />
derived from <strong>the</strong> filamentous virus (Fig. 1B). Genome<br />
analyses <strong>in</strong>dicated that <strong>the</strong> virus was of archaeal orig<strong>in</strong><br />
(Torar<strong>in</strong>sson et al., 2005), and all of <strong>the</strong> predicted genes<br />
lie on one strand of <strong>the</strong> genome, similar to <strong>the</strong> highly<br />
biased strand usage of <strong>the</strong> crenarchaeal <strong>the</strong>rmoneutrophilic<br />
viruses PSV and TTSV1 (Här<strong>in</strong>g et al., 2004; Ahn<br />
et al., 2006). A few clone sequences from <strong>the</strong> filament<br />
library assembled <strong>in</strong>to <strong>the</strong> PSV genome, <strong>in</strong>dicat<strong>in</strong>g that<br />
small amounts of that virus had co-purified with HAV1,<br />
Table 2. Genomic properties of <strong>the</strong> <strong>the</strong>rmoneutrophilic viruses and plasmids.<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2921<br />
consistent with <strong>the</strong> low levels of spherical particles<br />
observed <strong>in</strong> electron micrographs (Fig. 1B).<br />
No genes yielded highly significant matches <strong>in</strong> public<br />
sequence databases; only weak but persistent matches<br />
were observed to a Cas4-like prote<strong>in</strong> (DUF83) (ORF218),<br />
possibly a nuclease, for which matches were also<br />
observed <strong>in</strong> several crenarchaeal fuselloviruses (Redder<br />
et al., 2009) and <strong>the</strong> filamentous lipothrixvirus virus AFV1<br />
(Bettstetter et al., 2003), a parB-like partition prote<strong>in</strong><br />
(ORF253) and a transcriptional regulator (ORF146)<br />
(Table 3). Several ORFs carry putative transmembrane<br />
motifs, some with predicted signal peptides, as illustrated<br />
<strong>in</strong> Fig. 2A. The very low level of gene matches to public<br />
sequence databases is a characteristic of <strong>the</strong> o<strong>the</strong>r<br />
sequenced <strong>the</strong>rmoneutrophilic viruses PSV, TTSV1 and<br />
TTV1 (Janekovic et al., 1983; Bettstetter et al., 2003; Ahn<br />
et al., 2006), and appears to be a general feature of many<br />
crenarchaeal viral genomes (Prangishvili et al., 2006b).<br />
HAV1 genomic variants<br />
Although <strong>the</strong>re is a low level of sequence heterogeneity<br />
throughout <strong>the</strong> genome, <strong>the</strong>re are numerous local heterogeneity<br />
‘hot-spots’, present <strong>in</strong> almost half of <strong>the</strong> predicted<br />
genes (17 out of 40) as <strong>in</strong>dicated <strong>in</strong> Fig. 2A. In addition,<br />
several genomic variants of HAV1 were assembled with<br />
major alterations <strong>in</strong>clud<strong>in</strong>g gene <strong>in</strong>sertions of up to 350 bp<br />
Element ds DNA (kb) Form G+C content Doma<strong>in</strong> GenBank accession number<br />
HAV1 22 743 L<strong>in</strong>ear 46.2 <strong>Archaea</strong> GU722196<br />
HAV2 17 666 Circular 52.1 <strong>Archaea</strong> GU722197<br />
pHA1 33 795 Circular 45.4 <strong>Archaea</strong> GU722198<br />
pHB1 2 099 Circular 54.7 Bacteria GU722199<br />
pHB2a 4 780 Circular 61.6 Bacteria GU722200<br />
pHB2b 5 370 Circular 60.2 Bacteria GU722201<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2922 R. A. Garrett et al.<br />
Table 3. Significant ORF matches with<strong>in</strong> public sequence databases.<br />
ORF e-value Match Orig<strong>in</strong><br />
HAV1<br />
ORF253 4e-06 parB-like partition Dethiobacter alkaliphilus AHT 1<br />
ORF218 9e-05 Cas4-like (DUF83) Sulfolobus fusellovirus SSV2<br />
ORF170 1e-06 Hypo<strong>the</strong>tical – Tpen_1879 Thermofilum pendens Hrk 5<br />
ORF146 5e-05 CopG/Arc/MetJ family transcriptional regulator Pyrobaculum aerophilum str. IM2<br />
HAV2<br />
ORF1767 2e-11 Hypo<strong>the</strong>tical – ATV_gp60 (ORF710 – C-term<strong>in</strong>al 375 aa) Acidianus bicaudavirus ATV<br />
ORF909 1e-125 Primase/DNA polymerase Sulfolobus neozealandicus pORA1<br />
ORF506 2e-15 AAA-ATPase, CDC48-type (ORF618 N-term<strong>in</strong>al 210 aa) Acidianus bicaudavirus ATV<br />
ORF263 2e-04 ORF731 N-term<strong>in</strong>al 100 aa Sulfolobus bicaudavirus STSV1<br />
ORF122 3e-45 IS element Dka2 OrfA Desulfurococcus kamchatkensis<br />
ORF420 2e-160 IS element Dka2 OrfB Desulfurococcus kamchatkensis<br />
pHA1<br />
ORF575 2e-08 Phage/plasmid primase COG3378 (C-term<strong>in</strong>al 300 aa) P4 family<br />
ORF396 2e-25 C5-cytos<strong>in</strong>e-specific methylase Thermus phage P23-45<br />
ORF375 8e-42 Type III restriction enzyme, res subunit Thermofilum pendens Hrk 5<br />
ORF337 2e-07 DEAD/DEAH box helicase Thermofilum pendens Hrk 5<br />
ORF320 5e-18 Abortive <strong>in</strong>fection prote<strong>in</strong> Thermofilum pendens Hrk 5<br />
ORF282 4e-74 Integrase Thermofilum pendens Hrk 5<br />
ORF93 6e-06 Holliday junction resolvase Methanocaldococcus vulcanius M7<br />
pHB1<br />
ORF477 8e-14 Rep prote<strong>in</strong>-roll<strong>in</strong>g circle Bacterial plasmid pAB49<br />
pHB2a+b<br />
ORF399/557 8e-75 TraA-like, conjugal transfer Polaromonas naphthalenivorans CJ2<br />
ORF269 2e-47 RepB prote<strong>in</strong> Ac<strong>in</strong>etobacter baumannii ACICU<br />
pHB2a<br />
ORF116 3e-18 Hypo<strong>the</strong>tical – Veis_1406 Verm<strong>in</strong>ephrobacter eiseniae EF01-2<br />
pHB2b<br />
ORF115 2e-19 Hypo<strong>the</strong>tical – StreC_09508 Streptomyces sp. C<br />
and deletions of up to 1.5 kb, genes with altered<br />
sequences, and duplications. The number of clone<br />
sequences that assembled <strong>in</strong>to each of <strong>the</strong> variant<br />
regions (Table 4), relative to <strong>the</strong> number of clones <strong>in</strong> <strong>the</strong><br />
dom<strong>in</strong>ant genome, <strong>in</strong>dicated that <strong>the</strong> orig<strong>in</strong>al viral population<br />
was very heterogeneous, and this was re<strong>in</strong>forced by<br />
preparative gel electrophoresis pattern of <strong>the</strong> viral DNA<br />
which revealed a broad heterogeneous band between<br />
DNA size markers of 19.4 and 24 kb (Fig. 3).<br />
Ten assembled contigs of HAV1 variants showed major<br />
genomic changes with some carry<strong>in</strong>g two to three <strong>in</strong>dependent<br />
alterations (Table 4). Most of <strong>the</strong> deletions and<br />
o<strong>the</strong>r major genomic changes occur at one or more of <strong>the</strong><br />
11 adjo<strong>in</strong><strong>in</strong>g pyrimid<strong>in</strong>e-rich and pur<strong>in</strong>e-rich sequences,<br />
most of which are <strong>in</strong>tergenic (Fig. 2A; Table 5). These<br />
sites constitute partially conserved, low-complexity direct<br />
repeats along <strong>the</strong> genome, and some carry <strong>in</strong>verted<br />
repeats (Table 5). Only a quarter of <strong>the</strong> viral genes are<br />
affected by <strong>the</strong>se genomic changes. Of <strong>the</strong>se, ORFs<br />
123a, 156, 284, 102, 78a and 170 appear dispensable for<br />
<strong>the</strong> virion, while ORFs 140, 174, 276 and 352 can<br />
undergo large sequence variations, and ORF585, which<br />
conta<strong>in</strong>s two putative recomb<strong>in</strong>ation sites (Fig. 2A;<br />
Table 5), has undergone <strong>in</strong>sertions, partial deletions<br />
and/or extensive sequence changes, and exhibits altered<br />
start codon positions.<br />
Tadpole-shaped VLPs<br />
DNA was extracted from a purified viral preparation that<br />
was rich <strong>in</strong> tadpole-shaped VLPs (Fig. 1C), and was<br />
amplified us<strong>in</strong>g <strong>the</strong> f29 polymerase, before prepar<strong>in</strong>g a<br />
shot-gun clone library (Table 1; see Experimental procedures).<br />
Sequences were assembled, toge<strong>the</strong>r with some<br />
sequences from <strong>the</strong> bioreactor mix library (Table 1), <strong>in</strong>to<br />
a circular double-stranded (ds) DNA genome of 17 666<br />
kb (Fig. 2B), where <strong>the</strong> predicted genes are preceded by<br />
archaea-specific motifs (Torar<strong>in</strong>sson et al., 2005). There<br />
was little sequence heterogeneity <strong>in</strong> <strong>the</strong> HAV2 genome<br />
which almost certa<strong>in</strong>ly reflects <strong>the</strong> DNA amplification<br />
step prior to clon<strong>in</strong>g, such that an <strong>in</strong>itial dom<strong>in</strong>at<strong>in</strong>g component<br />
was preferentially amplified. As for HAV1, only<br />
one major contig was assembled and we <strong>in</strong>ferred <strong>the</strong>refore<br />
that it derived from <strong>the</strong> tadpole-shaped VLPs<br />
(Fig. 1C).<br />
In contrast to HAV1, a few significant matches to public<br />
sequence databases were found (Table 3). Highly significant<br />
matches occurred for an archaea-specific bifunctional<br />
DNA primase-polymerase encoded on two plasmids<br />
of Sulfolobus neozealandicus (Lipps et al., 2004; Greve<br />
et al., 2005), and for an IS element of <strong>the</strong> IS 200/650<br />
family present <strong>in</strong> Desulfurococcus kamchatkensis.<br />
Moreover, two matches occurred to a crenarchaeal<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
icaudavirus ATV which exhibits a similar sp<strong>in</strong>dle-shaped<br />
morphology but with two tails (Prangishvili et al., 2006c).<br />
Thus, <strong>the</strong> C-term<strong>in</strong>al 350 am<strong>in</strong>o acids of HAV2-ORF1767<br />
showed significant sequence similarity to <strong>the</strong> correspond<strong>in</strong>g<br />
region of ATV-ORF710, and HAV2-ORF506 also<br />
carries an AAA-ATPase doma<strong>in</strong> of <strong>the</strong> CDC48 type,<br />
similar to that present <strong>in</strong> ATV-ORF618 (Fig. 2B).<br />
<strong>Archaea</strong>l and bacterial plasmids<br />
Plasmid sequences were assembled ma<strong>in</strong>ly from <strong>the</strong><br />
clone libraries of samples lack<strong>in</strong>g DNase I treatment<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2923<br />
Fig. 2. Genome maps of <strong>the</strong> HAV1 (A) and HAV2 (B) viruses where predicted genes are <strong>in</strong>dicated by arrows and denoted by <strong>the</strong>ir am<strong>in</strong>o acid<br />
lengths. Significant predictions of gene product functions are <strong>in</strong>dicated. Striated genes encode predicted transmembrane motifs. In (A) red<br />
sections <strong>in</strong>dicate gene regions carry<strong>in</strong>g hot-spots for s<strong>in</strong>gle-site mutations. Putative recomb<strong>in</strong>ation sites are <strong>in</strong>dicated (•).<br />
before viral DNA extraction (Table 1). The 33 795 bp<br />
pHA1 (Hyper<strong>the</strong>rmophilic Archaeon) is of archaeal orig<strong>in</strong><br />
and was assembled from different libraries of nonamplified<br />
DNA, <strong>in</strong>clud<strong>in</strong>g that of <strong>the</strong> bioreactor mix and<br />
<strong>the</strong> PSV virus (Här<strong>in</strong>g et al., 2004) (Table 1). M<strong>in</strong>or<br />
sequence heterogeneities occurred throughout <strong>the</strong><br />
genome but no larger genomic changes were observed.<br />
About one-third of <strong>the</strong> 59 predicted genes are homologous<br />
to genes <strong>in</strong> <strong>the</strong> 31 504 bp plasmid TPEN01 from<br />
Thermofilum pendens Hrk5 (Anderson et al., 2008), and<br />
<strong>the</strong>y are clustered <strong>in</strong> <strong>the</strong> pHA1 genome (Fig. 4A). Several<br />
genes encod<strong>in</strong>g hypo<strong>the</strong>tical prote<strong>in</strong>s carry putative<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2924 R. A. Garrett et al.<br />
Table 4. Properties of <strong>the</strong> HAV1 genomic variants.<br />
HAV1 variant Number of clones Viral position Genome change ORF changes<br />
1 2 492–1827 Deleted 1337 bp, 80–143 replaced Deleted ORFs123a/156<br />
2 14 1032 65 bp partial duplication, T-C-rich region No ORF<br />
1659–3188 Deleted 1529 bp Deleted ORFs284/102<br />
3 6 3761 Altered gene ORF174 (ORF183, 36% identity/59% similarity)<br />
4 9 8042 Insert 92 bp – 60 bp repeat <strong>in</strong> start ORF141 end extends 10 aa, ORF94 start extends 36 aa<br />
5 41 8050 Insert 49 bp <strong>in</strong> C-rich region ORF141 end extends 9 aa, ORF94 start extends 10 aa<br />
8868–8755 Altered gene ORF218 (ORF218, 88% identity/92% similarity)<br />
9946–10757 Deleted 813 bp Deleted ORFs78a/170<br />
6 19 14430–14699 270 bp duplication No ORF<br />
15000 Insert 350 bp Variant ORF585, C-term<strong>in</strong>al half (ORF653)<br />
7 7 14360–14498,14737–14885 Altered gene Heterogeneities ORF585<br />
8 2 14995–15460 Deleted 375 bp Truncated ORF585<br />
15717–16062 Altered gene ORF585, altered 345 bp centrally<br />
9 8 17352–17899 Altered gene ORF276, altered central 170 aa (ORF259, 61% identity/71% similarity)<br />
10 16 19439–20029 Altered gene ORF325 (ORF315, 72% identity/80% similarity)<br />
Fig. 3. Characterization of DNA isolated from <strong>the</strong> purified<br />
preparation of <strong>the</strong> filamentous virus VLP enriched preparation<br />
(HAV1), after removal of plasmid and chromosomal DNA, and prior<br />
to generat<strong>in</strong>g <strong>the</strong> filament library (Table 1). M – DNA size markers.<br />
transmembrane motifs, some also exhibit<strong>in</strong>g predicted<br />
signal peptides, and <strong>the</strong>se <strong>in</strong>clude a cluster of 10 tightly<br />
l<strong>in</strong>ked genes some of which are probably co-transcribed<br />
(Fig. 4A). Although <strong>the</strong>re is no significant sequence similarity,<br />
<strong>the</strong>se prote<strong>in</strong>s may generate a novel conjugative<br />
apparatus, by analogy with a group of conjugative membrane<br />
prote<strong>in</strong>s encoded by a conserved gene cluster of<br />
conjugative plasmids of <strong>the</strong> crenarchaeal <strong>the</strong>rmoacidophiles<br />
(Greve et al., 2004).<br />
The three smaller plasmids were assembled exclusively<br />
from <strong>the</strong> tadpole-1/2 libraries of amplified DNA (Table 1)<br />
and <strong>the</strong> sequences are relatively homogeneous. Each<br />
plasmid is of bacterial orig<strong>in</strong>, as judged by promoter and<br />
ribosome b<strong>in</strong>d<strong>in</strong>g motifs (Torar<strong>in</strong>sson et al., 2005). The<br />
2099 bp pHB1 (Hyper<strong>the</strong>rmophilic Bacterium) encodes a<br />
large replication prote<strong>in</strong>, probably of <strong>the</strong> roll<strong>in</strong>g circle type<br />
(Table 3), and o<strong>the</strong>r predicted genes overlap on <strong>the</strong> two<br />
DNA strands (Fig. 4B). pHB2a and pHB2b, of 4780 and<br />
5370 bp, respectively, are variants shar<strong>in</strong>g 3780 bp of<br />
highly similar sequence but with two altered regions as<br />
illustrated (Fig. 4B). Whereas <strong>the</strong> shorter altered regions<br />
exhibit no sequence similarity, <strong>the</strong> larger regions of 781 bp<br />
and 1257 bp for pHB2a and pHB2b, respectively, carry<br />
about 300 bp with a low but significant level of sequence<br />
similarity. These altered sequences resulted <strong>in</strong> ORF125<br />
be<strong>in</strong>g exclusive to pHB2a, and ORFs 58, 60, 67, 68 and<br />
97 be<strong>in</strong>g specific to pHB2b (Fig. 4B). In addition, ORFs<br />
399 and 157 <strong>in</strong> pHB2a are fused <strong>in</strong> pHB2b (ORF557) and<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
Table 5. Putative recomb<strong>in</strong>ation sites associated with alterations <strong>in</strong> <strong>the</strong> variant HAV1 genomes.<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2925<br />
Variant number Genome change Genome positions Recomb<strong>in</strong>ation sites<br />
1a 1337 bp deletion 474–496 CCCTCCCCTTTTTCTATGAAGTCGAAGGTGGA<br />
1b Recomb<strong>in</strong>ation 1805–1833 TTTTTTCTTTTTCCTCTTTTTTTCCCTTCGGAGAAAAG<br />
2a 65 bp partial duplication 1014–1052 TCTTTTTTCCCCTCTTTTCCTTTCTTCATGATGAAAGGA<br />
2b 1529 bp deletion 1650–1677 CCTCTTTTTTTCTAGCCGCACCTCCTTTGGAGAAAAA<br />
2c 1529 bp deletion 3183–3193 TCTGACCCTTCGGAGAAAAA<br />
5a 49 bp <strong>in</strong>sertion 8060 CCCGTTCCCGGCGTCTCGGTGGAA<br />
6b 350 bp altered 15000 CTCCTCACTCTTCTTCTCGCTGTTCAGGAGGAGGA<br />
8b 345 bp replacement 16040–16065 CTTTGCTGTATCTATTGCGAGGAAGA<br />
Similar sites exist also at genome positions 559–585, 2728–2748 and 2766–2778. Inverted repeats (underl<strong>in</strong>ed) are present <strong>in</strong> some recomb<strong>in</strong>ation<br />
sites. Details of <strong>the</strong> variants are given <strong>in</strong> Table 4.<br />
Fig. 4. Genome maps of <strong>the</strong> circular plasmids (A) archaeal pHA1 and (B) three bacterial plasmids pHB1, pHB2a and pHB2b, where arrows<br />
<strong>in</strong>dicate predicted genes denoted by <strong>the</strong>ir am<strong>in</strong>o acid lengths. Striated genes encode predicted transmembrane motifs while grey shaded<br />
genes are homologous to genes <strong>in</strong> TPEN01. Shaded areas <strong>in</strong>side <strong>the</strong> circles for pHB2a and pHB2b <strong>in</strong>dicate regions of different sequence.<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2926 R. A. Garrett et al.<br />
ORF96 (pHB2a) and ORF98 (pHB2b), as well as ORF115<br />
(pHB2a) and ORF116 (pHB2b), show limited sequence<br />
differences (Fig. 4B). Both plasmid variants encode a replication<br />
prote<strong>in</strong> and a Tra-like conjugal prote<strong>in</strong> and carry a<br />
high G+C-rich region which may constitute an RNA gene<br />
(Fig. 4B).<br />
As for HAV2, <strong>the</strong>re was little sequence heterogeneity for<br />
<strong>the</strong> bacterial plasmids, which probably also reflects <strong>the</strong>ir<br />
amplification prior to clon<strong>in</strong>g. Detection of variants pHB2a<br />
and pHB2b suggests that both were substantial components<br />
<strong>in</strong> <strong>the</strong> orig<strong>in</strong>al DNA preparation.<br />
<strong>CRISPR</strong> spacer matches<br />
RNAs transcribed from <strong>CRISPR</strong> repeat clusters, and processed<br />
to spacer RNAs, can target and <strong>in</strong>activate extrachromosomal<br />
elements (reviewed <strong>in</strong> Van der Oost et al.,<br />
2009). Thus, host repeat clusters ma<strong>in</strong>ta<strong>in</strong> a record of<br />
<strong>in</strong>vad<strong>in</strong>g genetic elements. In pr<strong>in</strong>ciple <strong>the</strong>refore it should<br />
be possible to determ<strong>in</strong>e a host of an isolated genetic<br />
element by compar<strong>in</strong>g its genome sequence with<br />
<strong>CRISPR</strong> spacer sequences from chromosomes of potential<br />
hosts. We attempted to do this for <strong>the</strong> newly characterized<br />
viruses and plasmids by compar<strong>in</strong>g <strong>the</strong>ir<br />
sequences, and those of o<strong>the</strong>r available <strong>the</strong>rmoneutrophilic<br />
viruses and plasmids, with <strong>the</strong> 1321 spacer<br />
sequences <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> clusters of <strong>the</strong> 13 sequenced<br />
<strong>the</strong>rmoneutrophilic genomes (see Experimental procedures).<br />
Although a sequence comparison at a nucleotide<br />
level yielded no significant matches, a few significant<br />
matches were found when search<strong>in</strong>g at <strong>the</strong> more conserved<br />
am<strong>in</strong>o acid sequence level, after translat<strong>in</strong>g <strong>the</strong><br />
spacers <strong>in</strong>to all six read<strong>in</strong>g frames, essentially as<br />
described earlier (Shah et al., 2009). At an e-value cut-off<br />
of 0.12, <strong>the</strong>re are seven significant matches to <strong>the</strong> viruses<br />
and plasmids which are listed <strong>in</strong> Table 6, and 35 matches<br />
to annotated crenarchaeal prote<strong>in</strong>s <strong>in</strong> <strong>the</strong> 13 genomes<br />
(some of which may occur to <strong>in</strong>tegrated viruses or<br />
plasmids which were not removed from <strong>the</strong> data set).<br />
Given that viral/plasmid ORFs constituted only 0.8% of<br />
sequences present <strong>in</strong> <strong>the</strong> search (correspond<strong>in</strong>g to<br />
54 684 out of a total of 6 317 506 am<strong>in</strong>o acids), <strong>the</strong> results<br />
show a 20-fold preference for spacers match<strong>in</strong>g viral/<br />
plasmid ORFs over crenarchaeal genome ORFs which<br />
re<strong>in</strong>forces <strong>the</strong> significance of <strong>the</strong> matches (Table 6). A<br />
similar, and significant, analysis of <strong>the</strong> bacterial plasmids<br />
was not possible because of <strong>the</strong>ir small sizes and <strong>the</strong><br />
paucity of available bacterial <strong>the</strong>rmophile <strong>CRISPR</strong><br />
sequences.<br />
Of <strong>the</strong> four published genetic elements, PSV, TTSV1<br />
and TPEN01 yielded one or more significant spacer<br />
matches to a known host genus (Table 6). Moreover,<br />
HAV1 gave a good match to a Pyrobaculum, while<br />
HAV2 yielded good matches to Desulfurococcus and Table<br />
6. Significant <strong>CRISPR</strong> spacer matches to crenarchaeal <strong>the</strong>rmoneutrophilic viruses and plasmids.<br />
Crenarchaeal genome Total spacers <strong>CRISPR</strong> Spacer Virus/plasmid Host genus ORF e-value Alignment<br />
1LGRSYDTIRKYQ12<br />
:: : :::. : : :.<br />
114 AKILGREYDTVRKYRNAA 131<br />
Pyrobaculum arsenaticum 126 90 47 HAV1 253 0.012<br />
1 WLHWLYIYGASHTG 14<br />
:..:::.::.:.::<br />
568 RGKW IRWLYLYGSSKTGKTT 587<br />
Desulfurococcus kamchatkensis 94 88 10 HAV2 909 0.023<br />
1VVYVDETYTSATCP14<br />
:::::.:::. ::<br />
329 GITAVYVDEAYTSSKCPIHG 348<br />
Thermoproteus neutrophilus 225 26 13 HAV2 420 0.067<br />
1 DIWKIRWPEAIKS 13<br />
:: : : . . : : :.. .<br />
75 RNFDIWKVKWPTALRAQIA 93<br />
97 0.017<br />
Pyrobaculum<br />
Thermoproteus<br />
Thermoproteus neutrophilus 225 16 5 PSV<br />
1 RCDLCGRRVSYET 13<br />
:::.::: . .. :<br />
40 PDTRCD I CGRK I GYGPYMV 58<br />
Thermoproteus neutrophilus 225 38 5 TTSV1 Thermoproteus 100a 0.041<br />
1 RCDLCGRRVSYET 13<br />
::: .::: ... :<br />
40 PDTRCD ICGRK I GYGPYMV 58<br />
Thermoproteus neutrophilus 225 38 6 TTSV1 Thermoproteus 100a 0.041<br />
1 AQYNSWLESRL 11<br />
:::::: :::::<br />
112 EVEAQYNSWLESRLAVL 128<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930<br />
Thermofilum pendens 182 27 17 pTPEN01 Thermofilum 633 0.079<br />
<strong>CRISPR</strong> repeat clusters are identified by <strong>the</strong> total number of repeats, and spacers are numbered from <strong>the</strong> leader end. e-values derive from search<strong>in</strong>g translated spacers aga<strong>in</strong>st a database with a length of<br />
6.3 million am<strong>in</strong>o acids, where <strong>the</strong> viral/plasmid ORFs comprise 0.8%, and <strong>the</strong> crenarchaeal genome ORFs constitute 99.2%, of <strong>the</strong> sequence database. A total of 1321 <strong>CRISPR</strong> spacers were extracted from<br />
<strong>the</strong> 13 genomes which ranged from 39 repeats for Caldivirga maquil<strong>in</strong>gensis IC-167 to 225 for T. neutrophilus.
Thermoproteus, consistent with <strong>the</strong> high sequence similarity<br />
of <strong>the</strong> HAV2 genome to a Desulfurococcus IS<br />
element (Table 3; Fig. 2B). The next two most significant<br />
matches (not <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> table) were both between<br />
pHA1 ORF68 and s<strong>in</strong>gle spacers <strong>in</strong> T. pendens and T.<br />
neutrophilus, with e-values of 0.64 and 0.67, respectively,<br />
also consistent with <strong>the</strong> extensive gene homology<br />
between pHA1 and <strong>the</strong> T. pendens plasmid TPEN01<br />
(Fig. 4A). Thus, this approach appears to yield useful<br />
<strong>in</strong>sights <strong>in</strong>to possible hosts for <strong>the</strong> newly characterized<br />
archaeal genetic elements, and it should be more generally<br />
applicable for such metagenomic studies as more<br />
archaeal <strong>CRISPR</strong> repeat-cluster sequences, or whole<br />
genome sequences become available.<br />
Discussion<br />
We characterized <strong>the</strong> genomic diversity of viruses and<br />
plasmids <strong>in</strong> a bioreactor established from a sample from a<br />
hot spr<strong>in</strong>g at Yellowstone National Park (Obsidian Pool)<br />
and ma<strong>in</strong>ta<strong>in</strong>ed at 85°C and pH 6 for 2 years (Rachel<br />
et al., 2002). Us<strong>in</strong>g a variety of clon<strong>in</strong>g strategies to select<br />
for l<strong>in</strong>ear or circular genomes, and to dist<strong>in</strong>guish viruses<br />
from plasmids, <strong>the</strong> analyses yielded two novel viral<br />
genomes, HAV1 and HAV2, from samples highly enriched<br />
<strong>in</strong> filamentous and tadpole-shaped VLPs, respectively,<br />
where <strong>the</strong> former yielded several genomic variants. No<br />
additional longer genomic contigs were assembled, from<br />
ei<strong>the</strong>r sample, which could correspond to <strong>the</strong> o<strong>the</strong>r elongated<br />
VLP that was observed <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al sample<br />
(Fig. 1) (Rachel et al., 2002). Nei<strong>the</strong>r viral genome shows<br />
any clear similarity to o<strong>the</strong>r known archaeal viruses; only<br />
HAV2 shows morphological similarities with <strong>the</strong> two-tailed<br />
bicaudavirus ATV, and limited sequence similarity<br />
between two genes (Prangishvili et al., 2006c), and <strong>the</strong>y<br />
may be distantly related.<br />
Electron microscopic visualization of bioreactor<br />
samples taken at regular <strong>in</strong>tervals <strong>in</strong>dicated that <strong>the</strong> levels<br />
of <strong>in</strong>dividual types of VLPs dramatically rose and fell over<br />
time. This was also true for <strong>the</strong> HAV1 variants which<br />
showed different yields with time as revealed by gel electrophoresis<br />
(data not shown). Presumably this reflects a<br />
reaction to: (i) <strong>the</strong> availability of receptive host cells, and<br />
(ii) <strong>the</strong> ability to overcome <strong>the</strong> archaeal cellular <strong>CRISPR</strong><br />
<strong>immune</strong> <strong>system</strong>s (Lillestøl et al., 2006; Shah et al., 2009;<br />
Van der Oost et al., 2009). We <strong>in</strong>fer that <strong>the</strong> extensive<br />
variety of HAV1 variants, which carry numerous sequence<br />
changes and major genomic structural alterations, reflect<br />
adaptation of <strong>the</strong> virus to <strong>the</strong>se constra<strong>in</strong>ts. Moreover, <strong>the</strong><br />
fact that <strong>the</strong>y were isolated as virions (Table 1) suggests<br />
that <strong>the</strong>y are all functional. These observations may be<br />
relevant to an earlier study which demonstrated a selective<br />
bias of viruses <strong>in</strong> laboratory cultures of environmental<br />
samples which conta<strong>in</strong>ed diverse crenarchaeal fusellovi-<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2927<br />
ruses (Snyder et al., 2004). Possibly those that were<br />
undetectable by viral DNA amplification <strong>in</strong> <strong>the</strong> laboratory<br />
cultures had undergone genome rearrangements or were<br />
present <strong>in</strong> very low amounts, or <strong>in</strong> <strong>in</strong>tegrated form, at <strong>the</strong><br />
time of test<strong>in</strong>g.<br />
Attempts were made to identify putative viral hosts by<br />
isolat<strong>in</strong>g stra<strong>in</strong>s from <strong>the</strong> bioreactor us<strong>in</strong>g a laser microscope<br />
and cell sorter but none of <strong>the</strong>m were <strong>in</strong>fected with<br />
viruses and, moreover, no crenarchaeal stra<strong>in</strong>s were<br />
found which were <strong>in</strong>fected by <strong>the</strong> crude virus preparations<br />
(Fig. 1B and C) except for <strong>the</strong> spherical PSV, characterized<br />
earlier, which <strong>in</strong>fected two Pyrobaculum and<br />
Thermoproteus stra<strong>in</strong>s (Här<strong>in</strong>g et al., 2004). Moreover, at<br />
present no reliable practical procedures have been<br />
developed for transfect<strong>in</strong>g viral DNA <strong>in</strong>to neutrophilic<br />
crenarchaea.<br />
Sequence heterogeneities occur throughout <strong>the</strong> HAV1<br />
genome, such that <strong>the</strong> f<strong>in</strong>al sequence is necessarily a<br />
consensus, where <strong>the</strong> dom<strong>in</strong>ant nucleotide is taken at<br />
each position. Moreover, nearly half <strong>the</strong> predicted genes<br />
carry regions that were particularly susceptible to<br />
sequence change (Fig. 2A), and some of <strong>the</strong>se also <strong>in</strong>cur<br />
deletions or <strong>in</strong>sertions. We <strong>in</strong>fer that <strong>the</strong>ir gene products<br />
are most likely to be <strong>in</strong>volved <strong>in</strong> virus–host <strong>in</strong>teractions,<br />
cell adhesion or viral extrusion mechanisms. These genes<br />
<strong>in</strong>clude ORFs 140, 174, 276, 325 and 585 (Fig. 2A) and<br />
ORF585 is by far <strong>the</strong> most susceptible to change and is<br />
<strong>the</strong>refore a strong candidate for recognition of cellular<br />
receptors. The latter is rem<strong>in</strong>iscent of <strong>the</strong> hypervariable<br />
ORFTPX of <strong>the</strong> Thermoproteus virus TTV1, although<br />
ORFTPX sequence changes occurred by a different<br />
mechanism (Neumann and Zillig, 1990a,b)<br />
The most conserved genes of HAV1 (Fig. 2A) are<br />
strong candidates for participation <strong>in</strong> <strong>the</strong> basic viral<br />
mechanisms of DNA replication, transcriptional regulation<br />
and virion packag<strong>in</strong>g. Earlier studies on <strong>the</strong> filamentous<br />
and rod-shaped viruses of crenarchaeal <strong>the</strong>rmoacidophiles<br />
concluded that <strong>the</strong> conserved core viral genes tend<br />
to be concentrated at <strong>the</strong> centre of l<strong>in</strong>ear genomes (Vestergaard<br />
et al., 2008a,b) and this is consistent with <strong>the</strong><br />
variants carry<strong>in</strong>g deletions of comb<strong>in</strong>ations of <strong>the</strong> four<br />
genes at <strong>the</strong> left end of <strong>the</strong> genome (Fig. 2A).<br />
There is a precedent for <strong>the</strong> formation of multiple<br />
genomic variants of a crenarchaeal virus. Earlier, <strong>the</strong> crenarchaeal<br />
rudivirus SIRV1 was isolated and passed<br />
through a series of closely related Sulfolobus islandicus<br />
stra<strong>in</strong>s, before reisolat<strong>in</strong>g <strong>the</strong> virions and sequenc<strong>in</strong>g <strong>the</strong>ir<br />
genomes. Several SIRV1 variants were detected which<br />
also exhibited localized regions of <strong>in</strong>sertions, deletions,<br />
duplications and extensive gene sequence changes<br />
(Peng et al., 2004). However, at least some of <strong>the</strong> underly<strong>in</strong>g<br />
mechanisms of genomic change appear to be different.<br />
For example, HAV1 carries recomb<strong>in</strong>ation sites<br />
constitut<strong>in</strong>g low-complexity direct repeats, some of which<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2928 R. A. Garrett et al.<br />
can generate hairp<strong>in</strong> structures (Table 5) and are possibly<br />
related to <strong>the</strong> recomb<strong>in</strong>ation sites characterized for plasmids<br />
of Sulfolobus which can generate regular hairp<strong>in</strong><br />
structures (Peng et al., 2000; Greve et al., 2004). In contrast,<br />
SIRV1 variants <strong>in</strong>curred multiple 12 bp <strong>in</strong>dels,<br />
ma<strong>in</strong>ly with<strong>in</strong> genes (Peng et al., 2004) and <strong>the</strong>y were not<br />
observed for HAV1. Despite some mechanistic differences,<br />
<strong>the</strong> overall genomic changes <strong>in</strong> <strong>the</strong> viral variants<br />
are quite similar with some genes be<strong>in</strong>g conserved,<br />
o<strong>the</strong>rs dispensable and deleted, and a few genes are<br />
radically changed <strong>in</strong> sequence.<br />
In contrast to classical studies on virus characterization,<br />
a degree of uncerta<strong>in</strong>ty necessarily exists <strong>in</strong> <strong>the</strong> <strong>in</strong>terpretation<br />
of metagenomic data. It is difficult to confirm<br />
unambiguously a genome-type–morphotype relationship,<br />
especially when so few archaeal viral families are characterized<br />
although, as shown here, <strong>the</strong> uncerta<strong>in</strong>ty can be<br />
m<strong>in</strong>imized by first enrich<strong>in</strong>g VLPs. Moreover, attempts to<br />
identify potential archaeal hosts, on <strong>the</strong> basis of <strong>the</strong><br />
<strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>, will become more robust as<br />
more archaeal host chromosomes are sequenced but it<br />
will always be limited by <strong>the</strong> ability of some crenarchaeal<br />
viruses to <strong>in</strong>fect a broader range of host species (Lillestøl<br />
et al., 2006; 2009; Vestergaard et al., 2008b).<br />
Experimental procedures<br />
DNA isolation and sequenc<strong>in</strong>g<br />
All virion preparations from CsCl density gradients were<br />
dialysed aga<strong>in</strong>st 10 mM Tris-acetate, pH 6 overnight. For<br />
some libraries (Table 1) chromosomal and plasmid DNA contam<strong>in</strong>ation<br />
was removed from viral samples (tadpole-2 and<br />
filament) by treat<strong>in</strong>g first with DNase I (50 units ml -1 ) at 37°C<br />
for 15 m<strong>in</strong>. followed by heat <strong>in</strong>activation of <strong>the</strong> DNase I at<br />
85°C for 15 m<strong>in</strong>. Nucleic acid was isolated from virions as<br />
described earlier (Peng et al., 2004); briefly, virions were<br />
disrupted by <strong>in</strong>cubation with 1% SDS and 0.5 mg ml -1 prote<strong>in</strong>ase<br />
K at 50°C for 1 h, DNA was extracted by phenol and<br />
phenol-chloroform treatment before precipitat<strong>in</strong>g with 0.1 vol.<br />
of 3 M sodium acetate, pH 5.3, 0.8 vol. of isopropanol. The<br />
DNA pellet was washed with 70% ethanol, air-dried and<br />
resuspended <strong>in</strong> an appropriate volume of 10 mM Tris-HCl, pH<br />
8.0, 1 mM EDTA. Clone libraries were prepared by sonicat<strong>in</strong>g<br />
DNA to produce fragments of 2–3 kb and <strong>the</strong>n construct<strong>in</strong>g<br />
shot-gun libraries us<strong>in</strong>g SmaI-digested pUC18 as clon<strong>in</strong>g<br />
vector (Peng, 2008) and, also, us<strong>in</strong>g <strong>the</strong> L<strong>in</strong>ker Amplified<br />
Shotgun Library method described at http://www.sci.<br />
sdsu.edu/PHAGE/LASL/. DNA was extracted us<strong>in</strong>g a Model<br />
8000 Biobot (Qiagen, Westburg, Germany) and sequenced <strong>in</strong><br />
MegaBACE 1000 sequenators (Amersham Biotech, Amersham,<br />
UK). Viral and plasmid sequences were assembled<br />
us<strong>in</strong>g Sequencher 4.9 (http://www.genecodes.com/).<br />
Genome analyses and gene annotations were performed<br />
us<strong>in</strong>g Artemis (http://www.sanger.ac.uk/Software/Artemis/).<br />
Gene sequence searches were made <strong>in</strong> GenBank/EMBL<br />
(http://www.ncbi.nlm.nih.gov/blast) and motifs were identified<br />
us<strong>in</strong>g <strong>the</strong> SMART facility (http://smart.embl-heidelberg.de/).<br />
Identify<strong>in</strong>g spacer matches<br />
<strong>CRISPR</strong> spacer sequences were extracted from <strong>the</strong> available<br />
crenarchaeal <strong>the</strong>rmoneutrophilic genomes: A. pernix<br />
K1 (NC_000854), Caldivirga maquil<strong>in</strong>gensis IC-167<br />
(NC_009954), D. kamchatkensis 1221n (NC_011766),<br />
Hyper<strong>the</strong>rmus butylicus DSM 5456 (NC_008818), Ignicoccus<br />
hospitalis KIN4/I (NC_009776), Nitrosopumilus maritimus<br />
SCM1 (NC_010085), Pyrobaculum aerophilum IM2 (NC_<br />
003364), Pyrobaculum arsenaticum DSM 13514<br />
(NC_009376), Pyrobaculum calidifontis JCM 11548 (NC_<br />
009073), Pyrobaculum islandicum DSM 4184 (NC_008701),<br />
Staphylo<strong>the</strong>rmus mar<strong>in</strong>us F1 (NC_009033), T. pendens Hrk 5<br />
(NC_008698) and T. neutrophilus V24Sta (NC_010525).<br />
They were aligned aga<strong>in</strong>st HAV1, HAV2, pHA1, pHB1,<br />
pHB2a and pHB2b, and published genomes of <strong>the</strong> viruses<br />
PSV (AJ635161), TTSV1 (AY722806) and TTV1 (X14855),<br />
and <strong>the</strong> plasmid pTPEN01 (NC_008696), us<strong>in</strong>g an MMX<br />
optimized Smith-Waterman implementation (Saebø et al.,<br />
2005). Alignments were performed at both a nucleotide level<br />
and an am<strong>in</strong>o acid sequence level by translat<strong>in</strong>g <strong>the</strong> spacers<br />
<strong>in</strong> all seven read<strong>in</strong>g frames essentially as described earlier<br />
(Shah et al., 2009), where <strong>the</strong> false positive level was estimated<br />
by align<strong>in</strong>g <strong>the</strong> spacers aga<strong>in</strong>st all <strong>the</strong> above crenarchaeal<br />
genomes (m<strong>in</strong>us <strong>CRISPR</strong> repeat regions) and us<strong>in</strong>g<br />
this as a negative control.<br />
Acknowledgements<br />
We thank Ariane Bize, Lanm<strong>in</strong>g Chen, Hien Phan, John<br />
Smyth, Gisle Vestergaard and Kim Brügger for much help <strong>in</strong><br />
<strong>the</strong> early stages of this work. The research <strong>in</strong> Copenhagen<br />
was supported by <strong>the</strong> Natural Science Research Council.<br />
References<br />
Ahn, D.G., Kim, S.I., Rhee, J.K., Kim, K.P., Pan, J.G., and Oh,<br />
J.W. (2006) TTSV1, a new virus-like particle isolated from<br />
<strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeote Thermoproteus<br />
tenax. Virology 351: 280–290.<br />
Anderson, I., Rodriguez, J., Susanti, D., Porat, I., Reich, C.,<br />
Ulrich, L.E., et al. (2008) Genome sequence of Thermofilum<br />
pendens reveals an exceptional loss of biosyn<strong>the</strong>tic<br />
pathways without genome reduction. J Bacteriol 190:<br />
2957–2965.<br />
Andersson, A.F., and Banfield, J.F. (2008) Virus population<br />
dynamics and acquired virus resistance <strong>in</strong> natural microbial<br />
communities. Science 320: 1047–1050.<br />
Barrangou, R., Fremaux, C., Deveau, H., Richards, M.,<br />
Boyaval, P., Mo<strong>in</strong>eau, S., Romero D.A. and Horvath, P.<br />
(2007) <strong>CRISPR</strong> provides acquired resistance aga<strong>in</strong>st<br />
viruses <strong>in</strong> prokaryotes. Science 315: 1709–1712.<br />
Bettstetter, M., Peng, X., Garrett, R.A., and Prangishvili, D.<br />
(2003) AFV1, a novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic<br />
archaea of <strong>the</strong> genus Acidianus. Virology 315: 68–79.<br />
Bize, A., Peng, X., Prokofeva, M., Maclellan, K., Lucas, S.,<br />
Forterre, P., et al. (2008) Viruses <strong>in</strong> acidic geo<strong>the</strong>rmal environments<br />
of <strong>the</strong> Kamchatka pen<strong>in</strong>sula. Res Microbiol 159:<br />
358–366.<br />
Diez, B., Anton, J., Guixa-Boixereu, N., Pedros-Alio, C., and<br />
Rodriguez-Valera, F. (2000) Pulse-field gel electrophoresis<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
of virus assemblages present <strong>in</strong> a hypersal<strong>in</strong>e environment.<br />
Int Microbiol 3: 159–164.<br />
Greve, B., Jensen, S., Brügger, K., Zillig, W., and Garrett,<br />
R.A. (2004) Genomic comparison of archaeal conjugative<br />
plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239.<br />
Greve, B, Jensen, S., Phan, H., Brügger, K., Zillig, W., She,<br />
Q., and Garrett, R.A. (2005) Novel plasmids pTAU4,<br />
pORA1 and pTIK4 from Sulfolobus neozealandicus.<br />
<strong>Archaea</strong> 1: 319–325.<br />
Här<strong>in</strong>g, M., Peng, X., Brügger, K., Rachel, R., Stetter, K.O.,<br />
Garrett, R.A., and Prangishvili, D. (2004) Morphology and<br />
genome organisation of <strong>the</strong> virus PSV of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
archaeal genera Pyrobaculum and Thermoproteus:a<br />
novel virus family, <strong>the</strong> Globuloviridae. Virology 323: 233–<br />
242.<br />
Janekovic, D., Wunderl, S., Holz, I., Zillig, W., Gierl, A., and<br />
Neumann, H. (1983) TTV1, TTV2 and TTV3, a family of<br />
viruses of <strong>the</strong> extremely <strong>the</strong>rmophilic anaerobic, sulphur<br />
reduc<strong>in</strong>g, archaebacterium Thermoproteus tenax. Mol Gen<br />
Genet 192: 39–45.<br />
Krupovic, M., and Bamford, D.H. (2008) <strong>Archaea</strong>l proviruses<br />
TKV4 and MVV extend <strong>the</strong> PRD1-adenovirus l<strong>in</strong>eage to<br />
<strong>the</strong> phylum Euryarchaeota. Virology 375: 292–300.<br />
Krupovic, M., Forterre, P., and Bamford, D.H. (2010) Comparative<br />
analysis of <strong>the</strong> mosaic genomes of tailed archaeal<br />
viruses and proviruses suggests a common <strong>the</strong>mes for<br />
virion architecture and assembly with tailed viruses of bacteria.<br />
J Mol Biol 397: 144–160.<br />
Lawrence, C.M., Menon, S., Eilers, B.J., Bothner, B., Khayat,<br />
R., Douglas, T., and Young, M.J. (2009) Structural and<br />
functional studies of archaeal viruses. J Biol Chem 284:<br />
12599–12603.<br />
Lillestøl, R.K., Redder, P., Garrett, R.A., and Brügger, K.<br />
(2006) A putative viral defence mechanism <strong>in</strong> archaeal<br />
cells. <strong>Archaea</strong> 2: 59–72.<br />
Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H.,<br />
Christiansen, J., and Garrett, R.A. (2009) <strong>CRISPR</strong> families<br />
of <strong>the</strong> crenarchaeal genus Sulfolobus: bidirectional transcription<br />
and dynamic properties. Mol Microbiol 72: 259–<br />
272.<br />
Lipps, G., We<strong>in</strong>ierzl, A.O., von Scheven, G., Buchen, C. and<br />
Cramer, P. (2004) Structure of a bifunctional DNA primasepolymerase.<br />
Nat Struct Mol Biol 11: 157–162.<br />
Mochizuki, T., Yoshida, T., Tanaka, R., Forterre, P., Sako, Y.,<br />
and Prangishvili, D.(2010) Diversity of viruses of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
archaeal genus Aeropyrum, and isolation of<br />
<strong>the</strong> Aeropyrum pernix bacilliform virus 1, APBV1, <strong>the</strong> first<br />
representative of <strong>the</strong> family ‘Clavaviridae’. Virology 402:<br />
347–352.<br />
Neumann, H., and Zillig, W. (1990a) Structural variability <strong>in</strong><br />
<strong>the</strong> genome of Thermoproteus tenax virus TTV1. Mol Gen<br />
Genet 222: 435–437.<br />
Neumann, H., and Zillig, W. (1990b) The TTV1-encoded viral<br />
prote<strong>in</strong> TPX: primary structure of <strong>the</strong> gene and <strong>the</strong> prote<strong>in</strong>.<br />
Nucleic Acids Res 18: 195.<br />
Oren, A., Bratbak, G., and Hendal, M. (1997) Occurrence of<br />
virus-like particles <strong>in</strong> <strong>the</strong> Dead Sea. Extremophiles 1: 143–<br />
149.<br />
Ortmann, A.C., Wiedenheft, B., Douglas, T., and Young, M.<br />
(2006) Hot crenarchaeal viruses reveal deep evolutionary<br />
connections. Nat Rev Microbiol 4: 520–528.<br />
Novel viruses and plasmids from hyper<strong>the</strong>rmoneutrophiles 2929<br />
Peng, X. (2008) Evidence for <strong>the</strong> horizontal transfer of an<br />
<strong>in</strong>tegrase gene from a fusellovirus to a pRN-like plasmid<br />
with<strong>in</strong> a s<strong>in</strong>gle stra<strong>in</strong> of Sulfolobus and <strong>the</strong> implications for<br />
plasmid survival. Microbiology 154: 383–391.<br />
Peng, X., Holz, I., Zillig, W., Garrett, R. A., and She, Q.<br />
(2000) Evolution of <strong>the</strong> family of pRN plasmids and <strong>the</strong>ir<br />
<strong>in</strong>tegrase-mediated <strong>in</strong>sertion <strong>in</strong>to <strong>the</strong> chromosome of <strong>the</strong><br />
Crenarchaeon Sulfolobus solfataricus. J Mol Biol 303:<br />
449–454.<br />
Peng, X., Kessler, A., Phan, H., Garrett, R.A., and Prangishvili,<br />
D. (2004) Multiple variants of <strong>the</strong> archaeal DNA rudivirus<br />
SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel mechanism of<br />
genome variation. Mol Microbiol 54: 366–375.<br />
Porter, K., Russ, B.E.,and Dyall-Smith, M.L. (2007) Virus–<br />
host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10:<br />
418–424.<br />
Prangishvili, D., Forterre, P., and Garrett, R.A. (2006a)<br />
Viruses of <strong>the</strong> archaea: a unify<strong>in</strong>g view. Nat Rev Microbiol<br />
4: 837–838.<br />
Prangishvili, D., Garrett, R.A., and Koon<strong>in</strong>, E.V. (2006b)<br />
Evolutionary genomics of archaeal viruses: unique viral<br />
genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–<br />
67.<br />
Prangishvili, D., Vestergaard, G., Här<strong>in</strong>g, M., Aramayo, R.,<br />
Basta, T., Rachel, R., and Garrett, R.A. (2006c) Structural<br />
and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal<br />
virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />
cycle. J Mol Biol 359: 1203–1216.<br />
Rachel, R., Bettstetter, M., Hedlund, B.P., Här<strong>in</strong>g, M.,<br />
Kessler, A., Stetter, K.O., and Prangishvili, D. (2002)<br />
Remarkable morphological diversity of viruses and viruslike<br />
particles <strong>in</strong> terrestrial hot environments. Arch Virol 147:<br />
2419–2429.<br />
Redder, P., Peng, X., Brügger, K., Shah, S.A., Roesch, F.,<br />
Greve, B., She, Q., Schleper, C., Forterre, P., Garrett, R.A.,<br />
and Prangishvili, D. (2009) Four newly isolated fuselloviruses<br />
from extreme geo<strong>the</strong>rmal environments reveal<br />
unusual morphologies and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation<br />
mechanism. Environ Microbiol 11: 2849–2862.<br />
Saebø, P.E., Andersen, S.M., Myrseth, J., Laerdahl, J.K., and<br />
Rognes, T. (2005) PARALIGN: rapid and sensitive<br />
sequence similarity searches powered by parallel comput<strong>in</strong>g<br />
technology. Nucleic Acids Res 33: 535–539.<br />
Shah, S.A., Hansen, N.R., and Garrett, R.A. (2009) Distributions<br />
of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and plasmids<br />
of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />
<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Trans Biochem Soc 37: 23–<br />
28.<br />
Snyder, J.C., Spuhler, J., Wiedenheft, B., Roberto, F.F.,<br />
Douglas, T., and Young, M.J. (2004) Effects of cultur<strong>in</strong>g on<br />
<strong>the</strong> population structure of a hyper<strong>the</strong>rmophilic virus.<br />
Microbiol Ecol 48: 561–566.<br />
Torar<strong>in</strong>sson, E., Klenk, H.-P., and Garrett, R.A. (2005) Divergent<br />
transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>.<br />
Environ Microbiol 7: 47–54.<br />
Van der Oost, J., Jore, M.M., Westra, E.R., Lundgren, M.,<br />
and Brouns, S.J. (2009) <strong>CRISPR</strong>-based adaptive and heritable<br />
immunity <strong>in</strong> prokaryotes. Trends Biochem Sci 34:<br />
401–407.<br />
Vestergaard, G., Aramayo, R., Basta, T., Här<strong>in</strong>g, M., Peng,<br />
X., Brügger, K., Chen, L., Rachel, R., Boisset, N., Garrett,<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
2930 R. A. Garrett et al.<br />
R.A., and Prangishvili, D. (2008a) Structure of <strong>the</strong> Acidianus<br />
filamentous virus 3 and comparative genomics of<br />
related archaeal lipothrixviruses Acidianus. J Virol 82:<br />
371–381.<br />
Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter,<br />
M., Phan, H., Briegel, A., Rachel, R., Garrett, R.A., and<br />
Prangishvili, D. (2008b) SRV, a new rudiviral isolate from<br />
Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />
with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J Bacteriol<br />
190: 6837–6845.<br />
©2010SocietyforAppliedMicrobiologyandBlackwellPublish<strong>in</strong>gLtd,Environmental Microbiology, 12, 2918–2930
<strong>CRISPR</strong>/Cas and Cmr modules, mobility and evolution of adaptive <strong>immune</strong><br />
<strong>system</strong>s<br />
Abstract<br />
Shiraz A. Shah 1 , Roger A. Garrett* ,1<br />
<strong>Archaea</strong> Centre, Department of Biology, Copenhagen University, DK2200 Copenhagen N, Denmark<br />
Received 17 May 2010; accepted 22 July 2010<br />
Available onl<strong>in</strong>e 21 September 2010<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>eries of archaea and bacteria provide an adaptive and effective defence mechanism directed<br />
specifically aga<strong>in</strong>st viruses and plasmids. Present data suggest that both <strong>CRISPR</strong>/Cas and Cmr modules can behave like <strong>in</strong>tegral genetic<br />
elements. They tend to be located <strong>in</strong> <strong>the</strong> more variable regions of chromosomes and are displaced by genome shuffl<strong>in</strong>g mechanisms <strong>in</strong>clud<strong>in</strong>g<br />
transposition. <strong>CRISPR</strong> loci may be broken up and dispersed <strong>in</strong> chromosomes by transposons with <strong>the</strong> potential for creat<strong>in</strong>g genetic novelty. Both<br />
<strong>CRISPR</strong>/Cas and Cmr modules appear to exchange readily between closely related organisms where <strong>the</strong>y may be subjected to strong selective<br />
pressure. It is likely that this process occurs primarily via conjugative plasmids or chromosomal conjugation. It is <strong>in</strong>ferred that <strong>in</strong>terdoma<strong>in</strong><br />
transfer between archaea and bacteria has occurred, albeit very rarely, despite <strong>the</strong> significant barriers imposed by <strong>the</strong>ir differ<strong>in</strong>g conjugative,<br />
transcriptional and translational mechanisms. There are parallels between <strong>the</strong> <strong>CRISPR</strong> crRNAs and eukaryal siRNAs, most notably to germ cell<br />
piRNAs which are directed, with <strong>the</strong> help of effector prote<strong>in</strong>s, to silence or destroy transposons. No homologous prote<strong>in</strong>s are identifiable at<br />
a sequence level between eukaryal siRNA prote<strong>in</strong>s and those of archaeal or bacterial <strong>CRISPR</strong>/Cas and Cmr modules.<br />
Ó 2010 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.<br />
Keywords: <strong>CRISPR</strong>/Cas; <strong>CRISPR</strong>/Cmr; crRNA; Evolution; Mobile elements; siRNA<br />
1. Introduction<br />
The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s provide <strong>the</strong><br />
basis for an adaptive and a heriditable <strong>immune</strong> <strong>system</strong> directed<br />
aga<strong>in</strong>st <strong>the</strong> DNA and RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g elements.<br />
The former consists of <strong>CRISPR</strong> loci and physically l<strong>in</strong>ked<br />
cassettes of cas genes which toge<strong>the</strong>r appear to constitute<br />
<strong>in</strong>tegral genetic modules. The cmr genes of Cmr modules are<br />
also clustered and are sometimes l<strong>in</strong>ked directly to <strong>the</strong><br />
<strong>CRISPR</strong>/Cas modules. The <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong><br />
occurs <strong>in</strong> most archaea and about 70% of <strong>the</strong>se also carry Cmr<br />
modules, whereas only about 40% of bacteria conta<strong>in</strong><br />
<strong>CRISPR</strong>/Cas modules and about 30% of <strong>the</strong>se exhibit Cmr<br />
modules. Moreover, <strong>the</strong> archaea <strong>CRISPR</strong> loci consist of<br />
* Correspond<strong>in</strong>g author. Tel.: þ45 35322010.<br />
E-mail address: garrett@bio.ku.dk (R.A. Garrett).<br />
1 The two authors contributed equally to <strong>the</strong> work.<br />
Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
0923-2508/$ - see front matter Ó 2010 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.<br />
doi:10.1016/j.resmic.2010.09.001<br />
www.elsevier.com/locate/resmic<br />
clusters of spacer-repeat units and can vary <strong>in</strong> size from one to<br />
more than a hundred spacer-repeat units where each unit is<br />
about 60e90 bp with repeats and spacers of, on average, 30 bp<br />
and 40 bp, respectively (reviewed <strong>in</strong> Karg<strong>in</strong>ov and Hannon,<br />
2010). The <strong>CRISPR</strong> loci are preceded by non-prote<strong>in</strong> cod<strong>in</strong>g<br />
leader regions of about 150e550 bp (Tang et al., 2002; Jansen<br />
et al., 2002; Lillestøl et al., 2006, 2009), and <strong>the</strong>y are generally<br />
physically l<strong>in</strong>ked to a group of cas genes encod<strong>in</strong>g Cas<br />
prote<strong>in</strong>s of diverse functions (Jansen et al., 2002; Haft et al.,<br />
2005; Makarova et al., 2006).<br />
Critical for <strong>the</strong> function<strong>in</strong>g of <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are <strong>the</strong><br />
spacer sequences which derive from foreign <strong>in</strong>vad<strong>in</strong>g elements<br />
(Mojica et al., 2005; Pourcel et al., 2005; Bolot<strong>in</strong> et al., 2005;<br />
Barrangou et al., 2007). Whole transcripts are produced from<br />
<strong>CRISPR</strong> loci which <strong>in</strong>itiate with<strong>in</strong> <strong>the</strong> leader sequence adjacent<br />
to <strong>the</strong> first repeat (Lillestøl et al., 2009), and <strong>the</strong>y are<br />
subsequently processed <strong>in</strong> <strong>the</strong> repeat regions to yield endproducts<br />
correspond<strong>in</strong>g to s<strong>in</strong>gle spacer crRNAs (Tang et al.,<br />
2002, 2005; Lillestøl et al., 2006). Regulation of formation
28 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
of <strong>the</strong> whole <strong>CRISPR</strong> transcript is probably required to prevent<br />
<strong>in</strong>terference from promoter and term<strong>in</strong>ator regions which are<br />
randomly taken up <strong>in</strong> <strong>the</strong> spacers (Shah et al., 2009). The<br />
process<strong>in</strong>g is effected by specific Cas or Cmr prote<strong>in</strong>s which,<br />
at least for <strong>the</strong> latter, generate two discrete crRNAs each<br />
carry<strong>in</strong>g 8 bp of repeat at <strong>the</strong> 5 0 -end and lack<strong>in</strong>g 2 nt and 8 nt,<br />
from <strong>the</strong> 3 0 -end of each spacer (Hale et al., 2009). Comb<strong>in</strong>ations<br />
of prote<strong>in</strong>s <strong>the</strong>n transport <strong>the</strong> processed crRNAs to target<br />
and <strong>in</strong>activate <strong>in</strong>vad<strong>in</strong>g genetic elements for both <strong>CRISPR</strong>/Cas<br />
and <strong>CRISPR</strong>/Cmr <strong>system</strong>s (Brouns et al., 2008; Hale et al.,<br />
2008, 2009; Carte et al., 2008). Base pair<strong>in</strong>g mismatches<br />
occurr<strong>in</strong>g between <strong>the</strong> 5 0 8 nt repeat sequence of <strong>the</strong> crRNA<br />
and <strong>the</strong> sequence adjacent to <strong>the</strong> targeted protospacer of <strong>the</strong><br />
<strong>in</strong>vad<strong>in</strong>g DNA are essential for subsequent degradation of <strong>the</strong><br />
latter and for ensur<strong>in</strong>g that <strong>the</strong> chromosomal <strong>CRISPR</strong> locus,<br />
itself, is not targeted (Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2010).<br />
Cas and Cmr prote<strong>in</strong>s are phylogenetically and functionally<br />
very diverse and are <strong>in</strong>volved <strong>in</strong> at least two mechanistic pathways<br />
which target <strong>in</strong>vad<strong>in</strong>g genetic elements via <strong>the</strong> crRNAs.<br />
The <strong>CRISPR</strong>/Cas <strong>system</strong> specifically targets DNA (Marraff<strong>in</strong>i<br />
and Son<strong>the</strong>imer, 2008; Shah et al., 2009), while <strong>the</strong> <strong>CRISPR</strong>/<br />
Cmr <strong>system</strong> targets RNA (Hale et al., 2009). The two pathways<br />
require <strong>the</strong> products of <strong>the</strong> cas gene cassette adjo<strong>in</strong><strong>in</strong>g a <strong>CRISPR</strong><br />
locus or <strong>the</strong> products of <strong>the</strong> Cmr module which is ei<strong>the</strong>r directly<br />
l<strong>in</strong>ked to a <strong>CRISPR</strong>/Cas module or lies separately on <strong>the</strong> chromosome<br />
(Fig. 1) (Jansen et al., 2002; Makarova et al., 2006).<br />
Although most bacterial <strong>CRISPR</strong>/Cas modules are unpaired,<br />
different comb<strong>in</strong>ations of <strong>CRISPR</strong>/Cas and Cmr modules,<br />
<strong>in</strong>clud<strong>in</strong>g paired <strong>CRISPR</strong> loci, are common amongst <strong>the</strong> crenarchaea<br />
(Fig. 1D and E). Phylogenetic studies have demonstrated<br />
that homologs of a few Cas prote<strong>in</strong>s occur widely<br />
throughout <strong>the</strong> archaeal and bacterial doma<strong>in</strong>s, while o<strong>the</strong>rs are<br />
predom<strong>in</strong>antly archaeal or bacterial <strong>in</strong> character (Haft et al.,<br />
2005; Makarova et al., 2006).<br />
This article will consider <strong>the</strong> follow<strong>in</strong>g issues relat<strong>in</strong>g to <strong>the</strong><br />
mobility and evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>, generally<br />
us<strong>in</strong>g crenarchaeal <strong>CRISPR</strong> <strong>system</strong>s as representative examples:<br />
(1) Whe<strong>the</strong>r <strong>CRISPR</strong>/Cas modules constitute <strong>in</strong>tegral<br />
genetic units. (2) Phylogenetic relationships between <strong>CRISPR</strong>/<br />
Cas and Cmr modules. (3) Diversification and degeneration of<br />
<strong>CRISPR</strong>/Cas modules. (4) Mobilisation and loss of <strong>CRISPR</strong>/<br />
Cas modules. (5) Transfer of <strong>CRISPR</strong>/Cas modules between<br />
organisms. (6) Co-evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> <strong>in</strong> <strong>the</strong><br />
archaeal and bacterial doma<strong>in</strong>s. (7) A possible common<br />
ancestry with <strong>the</strong> diverse eukaryal siRNA <strong>system</strong>s.<br />
2. Methods<br />
Am<strong>in</strong>o acid sequences of Cas1 prote<strong>in</strong>s were collected from<br />
all publicly available archaeal and bacterial genomes by runn<strong>in</strong>g<br />
an <strong>in</strong>-house-constructed Cas1-specific HMM aga<strong>in</strong>st NCBI’s<br />
“non-redundant” prote<strong>in</strong> database. All sequences were extracted<br />
and an all-aga<strong>in</strong>st-all Smi<strong>the</strong>Waterman sequence comparison<br />
was made us<strong>in</strong>g <strong>the</strong> FASTA package (Pearson, 2000). After<br />
tak<strong>in</strong>g <strong>in</strong>to account <strong>the</strong> distribution of <strong>the</strong> result<strong>in</strong>g<br />
Smi<strong>the</strong>Waterman scores, all match<strong>in</strong>g sequence pairs were<br />
assigned weights between 0 and 1 with 0 correspond<strong>in</strong>g to<br />
aSmi<strong>the</strong>Waterman score of 200 or less, and 1 correspond<strong>in</strong>g to<br />
1200 or more. This was used as an <strong>in</strong>put for Markov cluster<strong>in</strong>g<br />
(MCL) (Enright et al., 2002) with <strong>the</strong> default options (<strong>in</strong>flation<br />
factor ¼ 2) as an <strong>in</strong>put for BioLayout (Goldovsky et al., 2005).<br />
Repeat sequences were clustered by a similar approach, but us<strong>in</strong>g<br />
Smi<strong>the</strong>Waterman DNA sequence alignments. Leader sequences<br />
were clustered us<strong>in</strong>g <strong>the</strong> same approach but with an MCL <strong>in</strong>flation<br />
factor of 1.2 due to <strong>the</strong>ir very low sequence conservation.<br />
With <strong>the</strong> exception of <strong>the</strong> genomes of Sulfolobus islandicus<br />
stra<strong>in</strong>s HVE10/4 and Rey15A and Acidianus brierleyi from our<br />
own lab, all o<strong>the</strong>r genomes are publicly available with <strong>the</strong><br />
accession numbers NC_009135, NC_009975, NC_009637,<br />
NC_005791, NC_013769, NC_012589, NC_012588,<br />
NC_012632, NC_012726, NC_012622, NC_012623, 4023466<br />
Fig. 1. Scheme show<strong>in</strong>g different arrangements of <strong>CRISPR</strong>/Cas and Cmr modules. A. Typical monomeric <strong>CRISPR</strong>/Cas structure. B. L<strong>in</strong>ked Cmr and <strong>CRISPR</strong>/Cas<br />
modules. C. Separated Cmr and <strong>CRISPR</strong>/Cas modules. D. Paired family I <strong>CRISPR</strong>/Cas modules carry<strong>in</strong>g <strong>in</strong>verted <strong>CRISPR</strong> loci. Typical gene contents and order<br />
for E. a paired crenarchaeal family I <strong>CRISPR</strong>/Cas module, and F. a crenarchaeal Cmr module.
(JGI project) CP001800, NC_002754. Dot-plots were constructed<br />
us<strong>in</strong>g <strong>the</strong> MUMmer package (Kurtz et al., 2004).<br />
<strong>CRISPR</strong> clusters were found us<strong>in</strong>g publicly available<br />
software (Bland et al., 2007) and Cmr modules were found<br />
us<strong>in</strong>g HMMs constructed <strong>in</strong>-house. The core genomes of<br />
Sulfolobus solfataricus and S. islandicus stra<strong>in</strong>s were determ<strong>in</strong>ed<br />
by f<strong>in</strong>d<strong>in</strong>g all orthologous genes occurr<strong>in</strong>g only once <strong>in</strong><br />
all <strong>the</strong> genomes. Orthologs were found by perform<strong>in</strong>g an allaga<strong>in</strong>st-all<br />
sequence similarity search for all <strong>the</strong> encoded<br />
prote<strong>in</strong>s with subsequent cluster<strong>in</strong>g us<strong>in</strong>g MCL (Enright et al.,<br />
2002). A multiple alignment was made of <strong>the</strong> DNA sequence<br />
correspond<strong>in</strong>g to each ortholog (Edgar, 2004). All multiple<br />
alignments, with gaps removed, were concatenated and <strong>the</strong><br />
result<strong>in</strong>g alignment was used to build a phylogenetic tree<br />
(Thompson et al., 1994). The length of each family I leader<br />
was determ<strong>in</strong>ed us<strong>in</strong>g sequence alignments of different leaders<br />
before construct<strong>in</strong>g <strong>the</strong> leader tree.<br />
3. Results and discussion<br />
3.1. Do <strong>CRISPR</strong>/Cas modules constitute <strong>in</strong>tegral genetic<br />
units?<br />
Several studies have detected a broad phylogenetic correlation<br />
between selected Cas prote<strong>in</strong>s and repeat sequences of<br />
<strong>CRISPR</strong> loci, with <strong>the</strong> reservation that <strong>the</strong> repeats are of<br />
limited and variable size (Haft et al., 2005; Kun<strong>in</strong> et al., 2007).<br />
For <strong>the</strong> Sulfolobales, phylogenetic analyses of sequences of<br />
repeats, leaders and Cas1 prote<strong>in</strong>s demonstrated that <strong>the</strong><br />
<strong>CRISPR</strong>/Cas modules could be classified <strong>in</strong>to at least three<br />
dist<strong>in</strong>ct families (Lillestøl et al., 2009). Here we extend this<br />
analysis and present comparative results for <strong>the</strong> Cas1 prote<strong>in</strong>,<br />
<strong>the</strong> leader and <strong>the</strong> repeat sequences us<strong>in</strong>g unsupervised cluster<strong>in</strong>g.<br />
MCL classifies nodes <strong>in</strong>to clusters based on pairwise<br />
distances to o<strong>the</strong>r nodes (Enright et al., 2002). Here, <strong>the</strong> nodes<br />
comprise <strong>the</strong> sequences of Cas1, <strong>the</strong> repeat and <strong>the</strong> leader and<br />
<strong>the</strong> distances correspond to <strong>the</strong> sequence alignment scores<br />
between <strong>the</strong>m. This approach is preferable to <strong>the</strong> use of<br />
phylogenetic trees for <strong>the</strong> follow<strong>in</strong>g reasons. Firstly, <strong>the</strong><br />
problem of del<strong>in</strong>eat<strong>in</strong>g boundaries between neighbour<strong>in</strong>g<br />
families is determ<strong>in</strong>ed by <strong>the</strong> algorithm itself, avoid<strong>in</strong>g <strong>the</strong><br />
potential error and bias of manual def<strong>in</strong>ition. Moreover, more<br />
than 1000 Cas1 sequences are available <strong>in</strong> public sequence<br />
databases and <strong>the</strong>y cannot be readily presented <strong>in</strong> phylogenetic<br />
trees, whereas <strong>the</strong>y can be visualised <strong>in</strong> a two- or threedimensional<br />
space us<strong>in</strong>g <strong>the</strong> Biolayout program (Goldovsky<br />
et al., 2005). F<strong>in</strong>ally, leader sequences share significant<br />
sequence similarity with<strong>in</strong>, but not across, families such that<br />
all leaders cannot be represented <strong>in</strong> one phylogenetic tree.<br />
Thus, MCL cluster<strong>in</strong>g is <strong>the</strong> best approach for automated<br />
classification of <strong>CRISPR</strong> leader sequences, and by us<strong>in</strong>g <strong>the</strong><br />
same method for Cas1 and repeat sequences, potential<br />
<strong>in</strong>consistencies aris<strong>in</strong>g from us<strong>in</strong>g different methodologies are<br />
avoided.<br />
The results are illustrated <strong>in</strong> Fig. 2 for <strong>the</strong> Sulfolobales and<br />
<strong>the</strong>y show closely similar cluster<strong>in</strong>g patterns for <strong>the</strong> Cas1, leader<br />
and repeat sequences of <strong>the</strong> <strong>CRISPR</strong>/Cas families I to IV<br />
S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
(Fig. 2BeD), consistent with earlier results (Lillestøl et al.,<br />
2009), and <strong>the</strong>y strongly suggest that <strong>the</strong> four <strong>CRISPR</strong>/Cas<br />
families have evolved <strong>in</strong>dependently and that <strong>the</strong>y do <strong>in</strong>deed<br />
constitute discrete genetic modules. The results <strong>in</strong> Fig. 2A reveal<br />
that each of <strong>the</strong> Sulfolobales families IeIVare components of an<br />
earlier def<strong>in</strong>ed group of families, CASS1 þ5 þ 6 þ 7(Haft et al.,<br />
2005; Makarova et al., 2006) that <strong>in</strong> Fig. 2A can be seen to merge<br />
<strong>in</strong>to a superfamily.<br />
For bacteria, a comparative genomic analysis of stra<strong>in</strong>s<br />
of Streptococcus <strong>the</strong>rmophilus also revealed a putative<br />
co-evolution of Cas prote<strong>in</strong>s and <strong>CRISPR</strong> loci with<strong>in</strong> <strong>the</strong><br />
<strong>CRISPR</strong>/Cas modules (Horvath et al., 2008), and a more<br />
extensive study of <strong>CRISPR</strong> loci <strong>in</strong> 47 genomes of a variety of<br />
genera and species of lactic acid bacteria revealed 8 different<br />
classes of <strong>CRISPR</strong>/Cas modules with evidence for a phylogenetic<br />
congruence between Cas1 prote<strong>in</strong> sequences, <strong>the</strong> repeat<br />
sequences, and <strong>the</strong> cas gene content and synteny but, with one<br />
partial exception, no phylogenetic l<strong>in</strong>k was detected between<br />
<strong>the</strong> leader regions and <strong>the</strong> rest of <strong>the</strong> <strong>CRISPR</strong>/Cas modules<br />
(Horvath et al., 2009). Whe<strong>the</strong>r <strong>the</strong> latter reflects a real<br />
difference <strong>in</strong> <strong>the</strong> significance of <strong>the</strong> leader between <strong>the</strong>se<br />
bacteria and <strong>the</strong> crenarchaea requires fur<strong>the</strong>r clarification.<br />
Amongst crenarchaea, <strong>the</strong>re is a preference for paired<br />
<strong>CRISPR</strong> loci which are <strong>in</strong>verted with respect to one ano<strong>the</strong>r,<br />
generally (see below) result<strong>in</strong>g <strong>in</strong> <strong>in</strong>ternalised leader regions<br />
and some cas genes located between <strong>the</strong> leaders (Fig. 1D and<br />
E) (Lillestøl et al., 2009). Moreover, for <strong>the</strong> Sulfolobales, at<br />
least, <strong>the</strong> paired modules belong to <strong>the</strong> same family and share<br />
a s<strong>in</strong>gle set of cas genes. Family I <strong>CRISPR</strong>/Cas modules are<br />
<strong>the</strong> most common amongst <strong>the</strong> Sulfolobales and o<strong>the</strong>r crenarchaea<br />
and <strong>the</strong>y are also <strong>the</strong> most conserved <strong>in</strong> structure. The<br />
cas genes are partitioned, with one group located between <strong>the</strong><br />
leaders and ano<strong>the</strong>r ly<strong>in</strong>g externally at one end of <strong>the</strong> module<br />
(Fig. 1D and E). This separation may be functionally significant<br />
with <strong>the</strong> <strong>in</strong>ternal cas genes adjacent to both leader regions<br />
encod<strong>in</strong>g prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g and <strong>in</strong>sertion of<br />
DNA spacer-repeat units, while <strong>the</strong> external cas genes encode<br />
RNA process<strong>in</strong>g and guid<strong>in</strong>g prote<strong>in</strong>s. There are fewer identified<br />
examples of paired family II and III <strong>CRISPR</strong>/Cas<br />
modules and <strong>the</strong>y appear to be less conserved <strong>in</strong> <strong>the</strong>ir genetic<br />
organisation than <strong>the</strong> family I modules and, at this stage, it is<br />
premature to propose a consensus structure. A similar familyspecific<br />
cas gene content and synteny has also been observed<br />
for <strong>CRISPR</strong>/Cas modules of lactic acid bacteria (Horvath<br />
et al., 2009). Presumably, <strong>the</strong> pair<strong>in</strong>g of <strong>the</strong> <strong>CRISPR</strong>/Cas<br />
modules reflects a compromise between limit<strong>in</strong>g <strong>the</strong> sizes of<br />
<strong>in</strong>dividual <strong>CRISPR</strong> loci and avoid<strong>in</strong>g <strong>the</strong> necessity of<br />
produc<strong>in</strong>g very long transcripts while us<strong>in</strong>g only one set of cas<br />
genes. Moreover, if one <strong>CRISPR</strong> locus becomes <strong>in</strong>activated as<br />
a result of, for example, mutations at <strong>the</strong> leader-repeat junction,<br />
<strong>the</strong> o<strong>the</strong>r locus will still be active.<br />
3.2. Phylogenetic relationships between <strong>CRISPR</strong>/Cas<br />
and Cmr modules<br />
The Cmr module has been implicated <strong>in</strong> direct<strong>in</strong>g processed<br />
crRNAs to target <strong>the</strong> RNA of <strong>in</strong>vad<strong>in</strong>g genetic<br />
29
30 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
Fig. 2. Results of MCL cluster<strong>in</strong>g of components of <strong>CRISPR</strong>/Cas modules visualised us<strong>in</strong>g BioLayout (Goldovsky et al., 2005). A. Cluster<strong>in</strong>g of all Cas1 prote<strong>in</strong>s<br />
found <strong>in</strong> public databases where 5 large clusters and 11 smaller ones emerge and are colour-coded. Sequences with<strong>in</strong> a given cluster show as little as 25% am<strong>in</strong>o<br />
acid sequence identity. Three of <strong>the</strong> large clusters correspond directly to previously def<strong>in</strong>ed families, labelled CASS2 to 4 (Haft et al., 2005; Makarova et al., 2006).<br />
B to D. Cluster<strong>in</strong>g of Sulfolobales <strong>CRISPR</strong>/Cas families I, II, III and IV: B - Cas1 prote<strong>in</strong>s; C leaders where leaders from <strong>the</strong> same family share about 70%<br />
nucleotide sequence identity and little or no nucleotide sequence conservation occurs between different families; D - repeats which show about 80% sequence<br />
identity with<strong>in</strong> a given family. The results for <strong>the</strong> four families <strong>in</strong> B to D show similar patterns. Colour-cod<strong>in</strong>g for <strong>the</strong> Sulfolobales <strong>CRISPR</strong>/Cas families: I - blue,<br />
II - purple, III - yellow, and IV - green. Family IV represents <strong>the</strong> <strong>CRISPR</strong>/Cas modules <strong>in</strong> Metallosphaera sedula and Acidianus brierleyi which were previously<br />
unclassified (Lillestøl et al., 2009).<br />
elements, whe<strong>the</strong>r RNA genomes, transcripts, or both,<br />
rema<strong>in</strong>s unclear (Hale et al., 2009). The cmr genes are<br />
apparently co-transcribed <strong>in</strong> a dist<strong>in</strong>ct cassette which is<br />
sometimes physically l<strong>in</strong>ked to <strong>the</strong> <strong>CRISPR</strong>/Cas module<br />
(Fig. 1). It occurs less widely than <strong>CRISPR</strong>/Cas modules, and<br />
is particularly prevalent <strong>in</strong> <strong>the</strong>rmophilic archaea and bacteria.<br />
Comparison of phylogenetic trees for <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />
Cmr modules, based on sequences of a Cas1 or Cas3 prote<strong>in</strong>s<br />
(<strong>the</strong> former is not present <strong>in</strong> all <strong>CRISPR</strong>/Cas modules) and<br />
a predicted polymerase, respectively, revealed two major<br />
branches for <strong>the</strong> Cmr modules, carry<strong>in</strong>g dist<strong>in</strong>ctive gene<br />
syntenies, but show<strong>in</strong>g little congruence with <strong>the</strong> Cas1/Cas3based<br />
tree (Makarova et al., 2006). This suggests that despite<br />
<strong>the</strong>ir be<strong>in</strong>g <strong>in</strong>terdependent mechanistically and sometimes<br />
physically coupled, <strong>the</strong> DNA- and RNA-directed <strong>system</strong>s<br />
have evolved <strong>in</strong>dependently. Both module types tend to be<br />
located <strong>in</strong> variable genomic regions and <strong>the</strong>ir positions, and<br />
copy numbers, vary even for <strong>the</strong> closely related Sulfolobus<br />
species (see below).<br />
3.3. Diversification and degeneration of <strong>CRISPR</strong>/Cas<br />
modules<br />
<strong>CRISPR</strong> loci vary considerably <strong>in</strong> size extend<strong>in</strong>g from<br />
a s<strong>in</strong>gle spacer bordered by repeats to a maximum, to date, of<br />
375 spacers (Lillestøl et al., 2006; Grissa et al., 2008). All such<br />
<strong>CRISPR</strong> loci that have been tested, <strong>in</strong>clud<strong>in</strong>g those lack<strong>in</strong>g<br />
leader regions, have been shown to produce transcripts which<br />
are processed (Tang et al., 2002, 2005; Brouns et al., 2008;<br />
Carte et al., 2008; Lillestøl et al., 2006, 2009). There is<br />
evidence from studies of both archaea and bacteria that<br />
<strong>CRISPR</strong> loci commonly undergo deletions without impair<strong>in</strong>g<br />
overall <strong>CRISPR</strong>/Cas functionality, and that <strong>the</strong> deletions can<br />
range <strong>in</strong> size from s<strong>in</strong>gle to several repeat-spacer units,<br />
presumably result<strong>in</strong>g from recomb<strong>in</strong>ation at <strong>the</strong> identical<br />
direct repeats. There is a tendency to lose <strong>the</strong> central and<br />
downstream regions of <strong>the</strong> <strong>CRISPR</strong> loci far<strong>the</strong>st from <strong>the</strong><br />
leader region, where <strong>the</strong> earliest spacer <strong>in</strong>serts are located, and<br />
which are likely to be less important for <strong>the</strong> <strong>immune</strong> <strong>system</strong>,
on average, than <strong>the</strong> more recently <strong>in</strong>serted spacers (Lillestøl<br />
et al., 2006, 2009; Tyson and Banfield, 2007; Deveau et al.,<br />
2008; Horvath et al., 2008). However, <strong>in</strong> addition to <strong>the</strong><br />
spacer-repeat units added at <strong>the</strong> leader-repeat junction<br />
(Pourcel et al., 2005; Lillestøl et al., 2006, 2009), <strong>the</strong>re are<br />
a few putative examples of duplications of spacer-repeat units,<br />
or small groups <strong>the</strong>reof, occurr<strong>in</strong>g <strong>in</strong> mycobacteria and<br />
methanoarchaea (Van Embden et al., 2000; Lillestøl et al.,<br />
2006). Moreover, it has also been claimed, for two out of<br />
four derivatives of S. <strong>the</strong>rmophilus stra<strong>in</strong> SMQ-301, that<br />
a s<strong>in</strong>gle new spacer-repeat unit was <strong>in</strong>serted <strong>in</strong>ternally with<strong>in</strong><br />
<strong>the</strong> <strong>CRISPR</strong> locus at <strong>the</strong> exact position where seven spacerrepeat<br />
units had been deleted, suggest<strong>in</strong>g that <strong>the</strong> <strong>in</strong>sertiondeletion<br />
events had occurred concurrently (Deveau et al.,<br />
2008). A related phenomenon occurs <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci of<br />
S. solfataricus stra<strong>in</strong>s. Pairwise alignments of <strong>CRISPR</strong> locus<br />
A of stra<strong>in</strong>s P1, P2 and 98/2 <strong>in</strong> Fig. 3 show shared spacers<br />
(shaded), as well as different spacers adjo<strong>in</strong><strong>in</strong>g <strong>the</strong> leader<br />
region and considered to have been added after <strong>the</strong> stra<strong>in</strong>s<br />
diverged. Deletions are apparent when pairs of <strong>CRISPR</strong> locus<br />
A are compared, but <strong>the</strong>re is one site <strong>in</strong> <strong>the</strong> P1 locus where six<br />
spacer-repeat units (a) have been replaced by four (b) from <strong>the</strong><br />
<strong>CRISPR</strong> locus B, presumably <strong>in</strong> a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />
(Fig. 3).<br />
Earlier studies suggested that mobile elements or <strong>in</strong>tegrative<br />
elements rarely target <strong>CRISPR</strong>/Cas modules <strong>in</strong> ei<strong>the</strong>r<br />
archaea or bacteria (Van Embden et al., 2000; Haft et al.,<br />
2005; Lillestøl et al., 2006). Moreover, <strong>in</strong> <strong>the</strong> three closely<br />
related stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2, which are<br />
rich <strong>in</strong> active transposable elements and where extensive<br />
genomic shuffl<strong>in</strong>g has been observed (Brügger et al., 2004;<br />
Redder and Garrett, 2006), no IS <strong>in</strong>sertions were detected <strong>in</strong><br />
<strong>the</strong>ir extensive <strong>CRISPR</strong> loci (350e450 spacer-repeat units)<br />
(Fig. 3). Thus, although <strong>the</strong>y do occur occasionally <strong>in</strong>tergenically<br />
<strong>in</strong> <strong>the</strong> cas and cmr gene clusters, <strong>the</strong>re appears to be<br />
a strong selective pressure to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong><br />
loci <strong>in</strong> crenarchaea. Never<strong>the</strong>less, recent studies of environmental<br />
bacterial samples suggest that transpositions occur<br />
commonly <strong>in</strong> some <strong>system</strong>s. In a study of two biofilms<br />
carry<strong>in</strong>g acidophilic Leptospirillum group II bacteria, for one<br />
biofilm about 20% of <strong>the</strong> partially sequenced <strong>CRISPR</strong> loci<br />
carried IS elements (Tyson and Banfield, 2007) and <strong>in</strong> a recent<br />
study of many lactic acid bacterial stra<strong>in</strong>s, several <strong>CRISPR</strong><br />
loci and cas genes cassettes were found to be <strong>in</strong>terrupted by IS<br />
S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
elements, with some flank<strong>in</strong>g ei<strong>the</strong>r end of a <strong>CRISPR</strong> locus<br />
(Horvath et al., 2009). Therefore, IS elements are likely to<br />
generate changes <strong>in</strong> active <strong>CRISPR</strong> loci possibly particularly<br />
<strong>in</strong> biofilms, or environments with low virus and/or plasmid<br />
levels.<br />
Thus, IS elements, or o<strong>the</strong>r transposable elements, may<br />
<strong>in</strong>duce shorten<strong>in</strong>g and/or degeneration of <strong>CRISPR</strong> loci by<br />
<strong>in</strong>sert<strong>in</strong>g <strong>in</strong>to <strong>CRISPR</strong> loci and caus<strong>in</strong>g transposition of<br />
spacer-repeat clusters to o<strong>the</strong>r chromosomal sites. Many<br />
chromosomes, with or without <strong>CRISPR</strong>/Cas modules, carry<br />
short <strong>CRISPR</strong>-like clusters lack<strong>in</strong>g associated leader regions<br />
and cas genes (Grissa et al., 2008). Their repeats are often<br />
phylogenetically divergent from <strong>the</strong> <strong>CRISPR</strong> loci <strong>in</strong> a given<br />
genome, or <strong>in</strong> closely related genomes. Although <strong>the</strong>re is no<br />
consensus view as to <strong>the</strong>ir orig<strong>in</strong>(s) or function(s), if <strong>the</strong>y are<br />
preceded by promoters, <strong>the</strong>ir transcripts can, <strong>in</strong> pr<strong>in</strong>ciple, be<br />
processed and activated by Cas and/or Cmr prote<strong>in</strong>s, if<br />
present. For example, Sulfolobus conjugative plasmids<br />
carry<strong>in</strong>g <strong>CRISPR</strong>-like loci lack cas genes and leader<br />
sequences (She et al., 1998; Greve et al., 2004) but for at least<br />
one of <strong>the</strong>m, pKEF9, <strong>the</strong> repeat cluster is transcribed and <strong>the</strong><br />
RNA is processed, which <strong>in</strong>dicates that <strong>the</strong> active crRNAs can<br />
be produced <strong>in</strong>tracellularly if a complementary set of cas<br />
genes (or cmr genes) is present <strong>in</strong> <strong>the</strong> host (Lillestøl et al.,<br />
2009). S<strong>in</strong>ce three of <strong>the</strong> six spacers <strong>in</strong> <strong>the</strong> pKEF9 repeat<br />
cluster have good sequence matches to archaeal fuselloviruses<br />
(2) and a rudivirus (1), it was proposed that <strong>the</strong> genetic<br />
elements may also exploit <strong>the</strong> host’s <strong>CRISPR</strong>/Cas (or<br />
<strong>CRISPR</strong>/Cmr) <strong>immune</strong> <strong>system</strong>, to compete with co-<strong>in</strong>vad<strong>in</strong>g<br />
foreign elements (Lillestøl et al., 2009). This hypo<strong>the</strong>sis is<br />
consistent with <strong>the</strong> demonstration that <strong>in</strong>fection of an Acidianus<br />
stra<strong>in</strong>, carry<strong>in</strong>g <strong>the</strong> conjugative plasmid pAH1 (lack<strong>in</strong>g<br />
a <strong>CRISPR</strong> locus), with <strong>the</strong> lipothrixvirus AFV1, led to <strong>in</strong>hibition<br />
of plasmid replication (Basta et al., 2009).<br />
3.4. Mobilisation and loss of <strong>CRISPR</strong>/Cas modules<br />
Genome analyses of closely related members of <strong>the</strong> Sulfolobales<br />
revealed <strong>CRISPR</strong>/Cas modules at different positions<br />
<strong>in</strong> genomes which show high levels of gene synteny, rais<strong>in</strong>g<br />
<strong>the</strong> question as to whe<strong>the</strong>r <strong>the</strong>y have moved with<strong>in</strong> <strong>the</strong><br />
genome, or been lost and/or ga<strong>in</strong>ed. There are also differences<br />
<strong>in</strong> <strong>the</strong> contents of <strong>the</strong> <strong>CRISPR</strong>/Cas module families <strong>in</strong> <strong>the</strong><br />
sequenced genomes. For example, S. solfataricus carries<br />
Fig. 3. Pairwise comparison of repeat-spacer units of <strong>CRISPR</strong> locus A of three stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2. Shaded spacer-repeat units which are<br />
l<strong>in</strong>ked are identical <strong>in</strong> sequence between pairs of <strong>CRISPR</strong> loci. Six spacer-repeat units, <strong>in</strong>dicated by a, are deleted from stra<strong>in</strong> P1, while four spacer-repeat units,<br />
denoted b, have apparently been acquired from <strong>CRISPR</strong> locus B (Lillestøl et al., 2009). Leader regions are <strong>in</strong>dicated by L.<br />
31
32 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
family I and II modules while Sulfolobus acidocaldarius<br />
carries modules of family II and III (Lillestøl et al., 2009).<br />
To determ<strong>in</strong>e whe<strong>the</strong>r <strong>CRISPR</strong>/Cas modules are readily<br />
mobilised, we <strong>in</strong>vestigated <strong>the</strong> presence or absence of<br />
<strong>CRISPR</strong>/Cas modules <strong>in</strong> genomes of pairs of closely related<br />
Sulfolobus, show<strong>in</strong>g >99% DNA sequence identity, respectively<br />
(Fig. 4). The Sulfolobus stra<strong>in</strong>s (She et al., 2001a; Reno<br />
et al., 2009) exhibit differences <strong>in</strong> <strong>the</strong> numbers of modules, for<br />
example, for <strong>the</strong> pair of closely related S. islandicus stra<strong>in</strong>s<br />
HVE10/4 and REY15A; <strong>the</strong> former carries two <strong>CRISPR</strong>/Cas<br />
modules and one Cmr module, whereas <strong>the</strong> latter has one<br />
<strong>CRISPR</strong>/Cas module and two Cmr modules. <strong>CRISPR</strong> loci of<br />
<strong>the</strong> five pairs of S. islandicus stra<strong>in</strong>s, <strong>in</strong> contrast to those of <strong>the</strong><br />
S. solfataricus stra<strong>in</strong>s (Fig. 3), share no common spacers.<br />
However each stra<strong>in</strong> carries one paired family I <strong>CRISPR</strong>/Cas<br />
module so that it was possible to test whe<strong>the</strong>r <strong>the</strong> module had<br />
persisted s<strong>in</strong>ce <strong>the</strong> 10 stra<strong>in</strong>s diverged. The genomic position<br />
of <strong>the</strong> module was compared between each stra<strong>in</strong> pair and was<br />
shown not to be conserved <strong>in</strong> position for 4 out of 5 pairs<br />
(Fig. 4). However, <strong>the</strong> displacements, even for <strong>the</strong> most closely<br />
related stra<strong>in</strong>s, could be attributed to <strong>the</strong> genomic region<br />
carry<strong>in</strong>g <strong>the</strong> module be<strong>in</strong>g variable and hav<strong>in</strong>g undergone<br />
complex rearrangements, ra<strong>the</strong>r than <strong>the</strong> module itself hav<strong>in</strong>g<br />
been mobilised. At present, <strong>the</strong>re are <strong>in</strong>sufficient closely<br />
related genomes available, for archaea and bacteria, which<br />
carry <strong>CRISPR</strong>/Cas modules to test for <strong>the</strong> generality of <strong>the</strong>se<br />
A B C<br />
D E<br />
results, although <strong>in</strong> an earlier study of two Thermatoga<br />
genomes, <strong>CRISPR</strong> loci were found to be located close to<br />
variable sites where chromosomal <strong>in</strong>versions had occurred<br />
(DeBoy et al., 2006).<br />
There are examples of <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g lost from<br />
genomes. For example, a variant stra<strong>in</strong> of S. solfataricus P2<br />
(P2A) was characterised that had lost four of <strong>the</strong> six <strong>CRISPR</strong>/<br />
Cas modules (A, B, C and D) which were physically l<strong>in</strong>ked, <strong>in</strong><br />
total 124 kb, apparently via a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />
between two border<strong>in</strong>g IS elements (Redder and Garrett, 2006),<br />
and S. solfataricus 98/2 lacks two whole clusters (C and F)<br />
(Lillestøl et al., 2009). Border<strong>in</strong>g IS elements also have <strong>the</strong><br />
potential to generate transposons carry<strong>in</strong>g whole <strong>CRISPR</strong>/Cas<br />
or Cmr modules and, rarely, paired family II <strong>CRISPR</strong>/Cas<br />
modules are bordered by identical <strong>in</strong>verted leaders (e.g.,<br />
<strong>CRISPR</strong> loci A and B of S. solfataricus) which could recomb<strong>in</strong>e,<br />
lead<strong>in</strong>g to loss of <strong>the</strong> whole module. Examples of closely related<br />
stra<strong>in</strong>s apparently los<strong>in</strong>g <strong>CRISPR</strong>/Cas modules have also been<br />
reported for some bacteria (e.g., Godde and Bickerton, 2006;<br />
Horvath et al., 2008).<br />
For S. solfataricus P2A, loss of <strong>CRISPR</strong>/Cas modules was<br />
attributed to its be<strong>in</strong>g a laboratory stra<strong>in</strong> where <strong>the</strong> <strong>CRISPR</strong>/<br />
Cas <strong>immune</strong> <strong>system</strong> had become an unnecessary burden on <strong>the</strong><br />
cell’s energy resources <strong>in</strong> <strong>the</strong> absence of <strong>in</strong>vad<strong>in</strong>g genetic<br />
elements. Possibly <strong>in</strong> niches relatively poor <strong>in</strong> viruses and<br />
plasmids, <strong>in</strong>clud<strong>in</strong>g numerous bacterial endosymbionts, <strong>the</strong>re<br />
Fig. 4. Dot-plots show<strong>in</strong>g <strong>the</strong> degree of variability <strong>in</strong> gene syteny at <strong>the</strong> genomic sites of <strong>the</strong> <strong>CRISPR</strong> loci for closely related pairs of Sulfolobus stra<strong>in</strong>s (AeE). At<br />
<strong>the</strong> top and right sides of each plot, I, II, III <strong>in</strong>dicates <strong>the</strong> position of a <strong>CRISPR</strong>/Cas family and C denotes a Cmr module. For <strong>the</strong> Sulfolobus stra<strong>in</strong>s, <strong>CRISPR</strong>/Cas<br />
modules and Cmr modules are <strong>in</strong>variably located with<strong>in</strong> an approximately 0.75 Mb variable region conta<strong>in</strong><strong>in</strong>g many IS elements. In general, <strong>the</strong> gene synteny<br />
border<strong>in</strong>g <strong>the</strong> modules, and <strong>the</strong> genomic locations of <strong>the</strong> modules, have changed, possibly due to transpositional activity.
is a tendency to offload <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>. For example,<br />
many human/animal pathogens <strong>in</strong>clud<strong>in</strong>g Borrelia, Brucella,<br />
Buchnera, Burkholderia, Chlamydia and Rikketsia lack<br />
<strong>CRISPR</strong> loci while o<strong>the</strong>rs, <strong>in</strong>clud<strong>in</strong>g Pseudomonas stra<strong>in</strong>s and<br />
Staphylococcus aureus, ei<strong>the</strong>r lack <strong>CRISPR</strong> loci or carry<br />
apparently degenerate copies. This may partly expla<strong>in</strong> why<br />
about 60% of bacteria lack <strong>the</strong> <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr<br />
<strong>immune</strong> <strong>system</strong>s (Grissa et al., 2008; Mojica et al., 2009).<br />
3.5. Transfer of <strong>CRISPR</strong>/Cas modules between<br />
organisms<br />
Various l<strong>in</strong>es of evidence suggest that <strong>CRISPR</strong> loci have<br />
been transferred between organisms. For example, <strong>the</strong> variety<br />
and comb<strong>in</strong>ations of different families of <strong>CRISPR</strong>/Cas modules<br />
that occur <strong>in</strong> closely related crenarchaeal genomes, with<br />
a similar pattern for <strong>the</strong> lactic acid bacterial genomes (Horvath<br />
et al., 2009; Lillestøl et al., 2009; Shah et al., 2009). This<br />
underl<strong>in</strong>es that exchange does occur between closely related<br />
organisms. O<strong>the</strong>r evidence derives from an analysis of <strong>the</strong><br />
euryarchaeon Pyrococcus furiosus, where a 155 kb fragment<br />
bordered by <strong>CRISPR</strong> locus and a repeat shows significantly<br />
different properties of G þ C content, third codon position and<br />
codon usage from <strong>the</strong> rest of <strong>the</strong> genome (Portillo and<br />
Gonzalez, 2009). Similarly, <strong>the</strong> lactic acid bacterium Bifidobacterium<br />
adolescentis was shown to carry a cas gene cassette<br />
with a much lower G þ C content (47%) than <strong>the</strong> average<br />
chromosomal G þ C content (59.2%) (Horvath et al., 2009).<br />
In order to exam<strong>in</strong>e <strong>the</strong> degree to which <strong>CRISPR</strong>/Cas<br />
modules are subject to structural changes, we exam<strong>in</strong>ed paired<br />
family I <strong>CRISPR</strong>/Cas modules <strong>in</strong> several closely related Sulfolobus<br />
stra<strong>in</strong>s. Phylogenetic trees were constructed for <strong>the</strong><br />
external and <strong>in</strong>ternal cas gene cassettes and <strong>the</strong> leader region<br />
(Fig. 1E) and <strong>the</strong>y were compared with a tree of <strong>the</strong> core<br />
genomes (Fig. 5A). The tree of <strong>the</strong> external cas gene cassette<br />
is similar to <strong>the</strong> core genome tree, suggest<strong>in</strong>g that <strong>the</strong>se genes<br />
were reta<strong>in</strong>ed <strong>in</strong> <strong>the</strong> genome after divergence of <strong>the</strong> stra<strong>in</strong>s<br />
(Fig. 5C). However, <strong>the</strong> trees for <strong>the</strong> <strong>in</strong>ternal cas gene cassette<br />
located between <strong>the</strong> two leaders (Fig. 5D) and <strong>the</strong> leader<br />
regions (Fig. 5B), match one ano<strong>the</strong>r fairly closely, and <strong>the</strong>y<br />
also match a tree derived from <strong>the</strong> repeat sequences (Lillestøl<br />
et al., 2009). This <strong>in</strong>dicates that <strong>the</strong> external cas genes, putatively<br />
<strong>in</strong>volved <strong>in</strong> RNA process<strong>in</strong>g and crRNA mobility, have<br />
been reta<strong>in</strong>ed with<strong>in</strong> <strong>the</strong> stra<strong>in</strong>s, whereas <strong>the</strong> <strong>in</strong>ternal cas gene<br />
cassettes, which are functionally implicated <strong>in</strong> spacer addition<br />
at <strong>the</strong> leader-repeat junction, seem to co-evolve, and be<br />
mobilised with, <strong>the</strong> <strong>CRISPR</strong> loci.<br />
Mechanisms of transfer of <strong>CRISPR</strong>/Cas modules are less<br />
clear and may be diverse. <strong>CRISPR</strong>/Cas loci can vary <strong>in</strong> size<br />
from about 7 kb for a cas gene cassette, a leader region and<br />
a small <strong>CRISPR</strong> locus, to 25 kb or more for <strong>the</strong> paired family I<br />
crenarchaeal <strong>CRISPR</strong>/Cas modules. Indirect evidence for <strong>the</strong><br />
transfer of <strong>CRISPR</strong>/Cas modules on conjugative plasmids<br />
arose from <strong>the</strong> observation that a few bacterial conjugative<br />
plasmids from Thermus <strong>the</strong>rmophilus, Synechocystis and<br />
Shewanella, carried <strong>CRISPR</strong> loci, sometimes associated with<br />
afewcas genes (Godde and Bickerton, 2006) and, moreover,<br />
S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
small <strong>CRISPR</strong> loci have been detected <strong>in</strong> two crenarchaeal<br />
conjugative plasmids (She et al., 1998; Peng et al., 2003;<br />
Greve et al., 2004). Although, for <strong>the</strong> latter, no physical<br />
proximity of <strong>in</strong>tegrated conjugative plasmids and <strong>CRISPR</strong> loci<br />
occurs with<strong>in</strong> Sulfolobus chromosomes (Chen et al., 2005;<br />
Kawarabayashi et al., 2001). To date, <strong>CRISPR</strong> loci have not<br />
been detected <strong>in</strong> viral genomes, although <strong>the</strong>y do occur with<strong>in</strong><br />
prophages of <strong>the</strong> human pathogen Clostridium difficile<br />
(Sebaihia et al., 2006).<br />
At least for <strong>the</strong> paired crenarchaeal <strong>CRISPR</strong>/Cas modules,<br />
<strong>the</strong>y were considered to be too large to be borne on extrachromosomal<br />
elements (Lillestøl et al., 2009). Ano<strong>the</strong>r more<br />
likely mechanism for transferr<strong>in</strong>g such large <strong>CRISPR</strong>/Cas<br />
modules between closely related organisms is via chromosomal<br />
conjugation. The archaea-specific <strong>in</strong>tegration mechanism,<br />
generat<strong>in</strong>g a partitioned <strong>in</strong>tegrase gene, provides<br />
a mechanism favour<strong>in</strong>g encaptur<strong>in</strong>g genetic elements <strong>in</strong><br />
chromosomes (Muskhelishvili et al., 1993; She et al., 2001b)<br />
and some Sulfolobus species that carry encaptured <strong>in</strong>tegrated<br />
conjugative plasmids are also capable of conjugat<strong>in</strong>g <strong>the</strong>ir<br />
chromosomal DNA (Aagaard et al., 1995; Grogan, 1996).<br />
Possibly unknown transmission mechanisms may operate, for<br />
example, with<strong>in</strong> biofilms.<br />
3.6. Co-evolution of <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> <strong>in</strong> <strong>the</strong><br />
archaeal and bacterial doma<strong>in</strong>s<br />
Ever s<strong>in</strong>ce <strong>the</strong> earliest studies on <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>,<br />
<strong>the</strong> prevail<strong>in</strong>g view has been that <strong>the</strong> archaeal and bacterial<br />
<strong>system</strong>s are closely related. This view was underp<strong>in</strong>ned by <strong>the</strong><br />
similar order<strong>in</strong>g of repeat-spacer units <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci and<br />
by extensive comparative sequence studies of selected Cas<br />
prote<strong>in</strong>s (Haft et al., 2005; Godde and Bickerton, 2006;<br />
Makarova et al., 2006). Moreover, it has been fur<strong>the</strong>r re<strong>in</strong>forced<br />
by <strong>the</strong> mechanism of elongation of <strong>CRISPR</strong> loci at <strong>the</strong><br />
leader-repeat junction as well as by process<strong>in</strong>g and maturation<br />
mechanisms of crRNAs <strong>in</strong> both doma<strong>in</strong>s (Tang et al., 2002,<br />
2005; Brouns et al., 2008; Hale et al., 2008, 2009).<br />
Never<strong>the</strong>less, <strong>the</strong>re are dist<strong>in</strong>ctive features of <strong>the</strong> two<br />
<strong>system</strong>s. <strong>CRISPR</strong> loci are much more common amongst<br />
archaea and tend to be larger, more complex and more labile<br />
(Lillestøl et al., 2006; Grissa et al., 2008). In addition, most<br />
repeat sequences of bacterial <strong>CRISPR</strong> loci carry <strong>in</strong>verted<br />
repeat motifs which can generate hairp<strong>in</strong> structures <strong>in</strong> transcripts;<br />
<strong>the</strong>se are less common amongst archaeal repeats<br />
which, <strong>in</strong> turn, suggests that different RNA process<strong>in</strong>g signals<br />
occur with<strong>in</strong> repeat regions of <strong>the</strong> transcripts (Lillestøl et al.,<br />
2006; Kun<strong>in</strong> et al., 2007). Moreover, phylogenetic relationships<br />
based on Smi<strong>the</strong>Waterman alignments show that most<br />
families of archaeal and bacterial repeat sequences exhibit<br />
m<strong>in</strong>imal overlap (Kun<strong>in</strong> et al., 2007). A similar pattern arises<br />
from sequence alignments of Cas prote<strong>in</strong>s where phylogenetic<br />
trees of Cas prote<strong>in</strong>s show many archaeal genes cluster<strong>in</strong>g <strong>in</strong><br />
separate groups (Fig. 2A) (Haft et al., 2005; Godde and<br />
Bickerton, 2006; Makarova et al., 2006). In addition, <strong>the</strong><br />
average synteny of <strong>the</strong> cas and cmr genes is quite conserved<br />
with<strong>in</strong>, but not between, major phyla (Haft et al., 2005). There<br />
33
34 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
Fig. 5. Phylogenetic trees of S. solfataricus and S. islandicus stra<strong>in</strong>s based on: (A) nucleotide sequence alignments of core genomes of <strong>the</strong> host organisms,<br />
(B) leader sequences, (C) concatenated external cas genes, and (D) concatenated <strong>in</strong>ternal cas genes for paired family I <strong>CRISPR</strong> loci. Only bootstrap values less<br />
than 100% are given. All 12 stra<strong>in</strong>s were too closely related to be dist<strong>in</strong>guished on <strong>the</strong> basis of 16S rDNA sequences. For S. islandicus stra<strong>in</strong>s LS, LD, YG, YN and<br />
M14 (Reno et al., 2009), it is evident that s<strong>in</strong>ce <strong>the</strong>y diverged from <strong>the</strong>ir closest relative, identical changes have occurred <strong>in</strong> both copies of <strong>the</strong> leader, possibly due<br />
to <strong>the</strong> whole <strong>CRISPR</strong>/Cas module, or a part of it, hav<strong>in</strong>g been replaced. In contrast, <strong>the</strong> external cas cassette appears to have resided on <strong>the</strong> genome s<strong>in</strong>ce all stra<strong>in</strong>s<br />
diverged because of <strong>the</strong> similarity of trees A and C. The <strong>in</strong>ternal cas cassette tree (D) follows that of <strong>the</strong> leader consistent with tight functional coupl<strong>in</strong>g.<br />
is also a <strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong>, of elusive function<br />
that has only been detected amongst <strong>the</strong> crenarchaea (Peng<br />
et al., 2003). O<strong>the</strong>r mechanistic differences may surface as<br />
<strong>the</strong> <strong>system</strong>s are studied more widely and <strong>in</strong> more depth.<br />
Importantly, however, crenarchaeal viruses have radically<br />
different virusehost relationships from those of bacteria and<br />
eukarya (Prangishvili et al., 2006; Bize et al., 2009). Consistent<br />
with this, <strong>the</strong>re are putative archaeal virus-specific anti-<br />
<strong>CRISPR</strong> <strong>system</strong>s (Peng et al., 2004; Vestergaard et al., 2008;<br />
Garrett et al., 2010) and bacteria-specific <strong>CRISPR</strong> regulat<strong>in</strong>g<br />
<strong>system</strong>s (Pühl et al., 2010). Therefore, it is likely that <strong>the</strong><br />
<strong>CRISPR</strong>/Cas and Cmr <strong>system</strong>s have ma<strong>in</strong>ta<strong>in</strong>ed and/or<br />
undergone doma<strong>in</strong>-specific adaptations dur<strong>in</strong>g evolution.<br />
Assum<strong>in</strong>g that a <strong>CRISPR</strong>/Cas-like <strong>system</strong> evolved prior to<br />
<strong>the</strong> separation of <strong>the</strong> archaeal and bacterial l<strong>in</strong>eages, at a time<br />
when one assumes <strong>the</strong> activity of exchange of genetic<br />
elements was rife, we are left with two ma<strong>in</strong> scenarios for <strong>the</strong>ir<br />
subsequent development: (1) that <strong>the</strong> <strong>system</strong>s have rema<strong>in</strong>ed<br />
relatively conserved, and separated, and have gradually<br />
developed specific archaeal, or bacterial, characteristics; or<br />
(2) <strong>the</strong>re has been periodic <strong>in</strong>terdoma<strong>in</strong> exchange, and <strong>the</strong>reby<br />
co-evolution of <strong>the</strong> archaeal and bacterial <strong>system</strong>s.<br />
Clearly, cross<strong>in</strong>g doma<strong>in</strong> boundaries would be a very<br />
complex process given <strong>the</strong> basic differences between archaea<br />
and bacteria <strong>in</strong> <strong>the</strong>ir transcriptional <strong>in</strong>itiation, elongation and<br />
term<strong>in</strong>ation mechanisms, and <strong>the</strong>ir translational <strong>in</strong>itiation<br />
mechanisms (Torar<strong>in</strong>sson et al., 2005; Santangelo et al., 2009)<br />
and would be very unlikely to occur for modern cells. Transfer<br />
by conjugation would also be unlikely given <strong>the</strong> differ<strong>in</strong>g<br />
conjugative <strong>system</strong>s and <strong>the</strong> different membrane and cell wall
structures of archaea and bacteria (Greve et al., 2004; Veith<br />
et al., 2009). For <strong>the</strong> <strong>CRISPR</strong>/Cas and Cmr modules, <strong>in</strong>terdoma<strong>in</strong><br />
transfer would seriously compromise both expression<br />
of <strong>the</strong> numerous essential cas and cmr genes as well as transcription<br />
of <strong>the</strong> <strong>CRISPR</strong> loci. In this context, <strong>the</strong> <strong>in</strong>fluential<br />
claim of <strong>the</strong> large uptake of function<strong>in</strong>g archaeal genes <strong>in</strong> <strong>the</strong><br />
genome of <strong>the</strong> bacterium Thermotoga maritima (24% of <strong>the</strong><br />
total <strong>in</strong>clud<strong>in</strong>g a <strong>CRISPR</strong> locus) (Nelson et al., 1999), was<br />
always highly controversial, not least because it would have<br />
required <strong>the</strong> wholesale reprogramm<strong>in</strong>g of a large part of <strong>the</strong><br />
chimeric genome for transcriptional and translational signals.<br />
A recent reevaluation of this genome, toge<strong>the</strong>r with those of<br />
four o<strong>the</strong>r members of <strong>the</strong> Thermatogales, has provided<br />
a much more nuanced and cautious view of <strong>the</strong> phylogenetic<br />
orig<strong>in</strong>s of <strong>the</strong>se bacteria (Zhaxybayeva et al., 2009), <strong>the</strong>reby<br />
underl<strong>in</strong><strong>in</strong>g <strong>the</strong> perils of <strong>in</strong>ferr<strong>in</strong>g phylogeny from BLAST<br />
sequence searches. On <strong>the</strong> o<strong>the</strong>r hand, co-evolution of <strong>the</strong><br />
archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s would only<br />
require cross-doma<strong>in</strong> events to succeed very rarely, after<br />
which <strong>the</strong> transferred <strong>system</strong> could be under strong selective<br />
pressure. Some limited <strong>in</strong>terdoma<strong>in</strong> transfer would be<br />
consistent with <strong>the</strong> phylogenetic trees produced for Cas1 or<br />
Cas3 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/Cas modules and Cmr2 of <strong>the</strong><br />
Cmr module (Haft et al., 2005; Godde and Bickerton, 2006;<br />
Makarova et al., 2006). <strong>Archaea</strong>-specific Cas prote<strong>in</strong>s (Haft<br />
et al., 2005) may be associated with <strong>CRISPR</strong>/Cas or Cmr<br />
<strong>system</strong>s that have evolved more <strong>in</strong>dependently <strong>in</strong> environments<br />
of high temperature, extremes of pH or hypersal<strong>in</strong>e<br />
conditions where bacterial levels tend to be relatively low, and<br />
gradually become functionally <strong>in</strong>compatible with those liv<strong>in</strong>g<br />
<strong>in</strong> less extreme, bacteria-rich environments where limited<br />
genetic exchange between archaea and bacteria is more likely.<br />
3.7. A common ancestry with <strong>the</strong> diverse eukaryal siRNA<br />
<strong>system</strong>s?<br />
Diverse small <strong>in</strong>terference RNA <strong>system</strong>s (siRNA) are<br />
widespread <strong>in</strong> eukarya. Thus, <strong>in</strong> plants, small RNAs are<br />
important for antiviral defence and regulation of transposons<br />
and similar functions are common amongst <strong>in</strong>vertebrates<br />
(Hannon, 2002; J<strong>in</strong>ek and Doudna, 2009). Moreover, <strong>the</strong>y<br />
have been implicated <strong>in</strong> controll<strong>in</strong>g repeat and transposon<br />
contents of somatic nuclei <strong>in</strong> protozoa (Mochizuki and<br />
Gorovsky, 2004). Although some mechanisms are conf<strong>in</strong>ed<br />
to certa<strong>in</strong> eukaryal l<strong>in</strong>eages, <strong>the</strong>y all essentially provide<br />
a mechanism for discrim<strong>in</strong>at<strong>in</strong>g and target<strong>in</strong>g “foreign”<br />
genetic elements or transposons. Moreover, <strong>the</strong>re are broad<br />
mechanistic similarities between <strong>the</strong> eukaryal siRNA <strong>system</strong>s<br />
and <strong>the</strong> DNA- and RNA-target<strong>in</strong>g <strong>CRISPR</strong> <strong>system</strong>s. They all<br />
have to discrim<strong>in</strong>ate foreign DNA from self-DNA, and target<br />
nucleic acids which both show little sequence similarity and<br />
can undergo cont<strong>in</strong>ual sequence change.<br />
There is a limited parallel between <strong>the</strong> <strong>CRISPR</strong>/Cmr RNAtarget<strong>in</strong>g<br />
and eukaryal antiviral <strong>system</strong>s. The latter cut and<br />
process <strong>in</strong>vad<strong>in</strong>g dsRNAviruses <strong>in</strong>to small 21e22 bp dsRNAs by<br />
an endonuclease (Dicer), and <strong>the</strong>se are converted <strong>in</strong>to ssRNAs<br />
by <strong>the</strong> Argonaute prote<strong>in</strong>eRISC complex. The prote<strong>in</strong>eRNA<br />
S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
complex locates and anneals to a viral mRNA carry<strong>in</strong>g<br />
a complementary sequence which is <strong>the</strong>n <strong>in</strong>activated by ano<strong>the</strong>r<br />
endonuclease (Slicer). However, <strong>the</strong> <strong>in</strong>itial process<strong>in</strong>g step<br />
<strong>in</strong>volv<strong>in</strong>g <strong>the</strong> Dicer endonuclease seems to be quite different <strong>in</strong><br />
<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>.<br />
The closest parallel to <strong>the</strong> crRNAs and <strong>CRISPR</strong> loci<br />
amongst <strong>the</strong> eukaryal siRNA <strong>system</strong>s are <strong>the</strong> Argonaute Piwi<strong>in</strong>teract<strong>in</strong>g<br />
RNAs (piRNAs) processed from piRNA cluster<br />
transcripts which also do not require a Dicer-like endonuclease<br />
(Lillestøl et al., 2009; Karg<strong>in</strong>ov and Hannon, 2010). This<br />
eukaryal <strong>system</strong> has been studied primarily <strong>in</strong> <strong>in</strong>sects, fish and<br />
mammals and strong evidence has been provided for its<br />
<strong>in</strong>volvement <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g germl<strong>in</strong>e <strong>in</strong>tegrity and development<br />
(Arav<strong>in</strong> et al., 2008; Klattenhoff and Theurkauf, 2008).<br />
The piRNA clusters are rich <strong>in</strong> transposons and repeatsequence<br />
elements and occur at specific chromosomal sites, as<br />
for <strong>the</strong> <strong>CRISPR</strong> loci. The piRNA clusters <strong>in</strong>crease <strong>the</strong>ir<br />
<strong>in</strong>formational capacity by <strong>the</strong> <strong>in</strong>sertion of transposon<br />
sequences which provide novel sequence content and become<br />
fixed <strong>in</strong> <strong>the</strong> piRNA clusters by selection. Thus, cont<strong>in</strong>ual<br />
expansion of piRNA clusters occurs, as for <strong>CRISPR</strong> loci, but<br />
<strong>the</strong> process is passive ra<strong>the</strong>r than directed. Moreover, as for <strong>the</strong><br />
<strong>CRISPR</strong>/Cas <strong>system</strong>, <strong>the</strong> newly <strong>in</strong>corporated DNA derives<br />
exclusively from genetic elements that are to be targeted. Both<br />
piRNA clusters and <strong>CRISPR</strong> loci yield large transcripts prior<br />
to process<strong>in</strong>g <strong>in</strong>to smaller RNAs. The processed piRNAs are<br />
24e30 nt <strong>in</strong> length while <strong>the</strong> crRNAs lie <strong>in</strong> <strong>the</strong> range 39e45<br />
nt. piRNAs complex with <strong>the</strong> Argonaute Piwi/RISC prote<strong>in</strong><br />
complex, similarly to crRNAs assembl<strong>in</strong>g <strong>in</strong> Cas or Cmr<br />
prote<strong>in</strong> complexes, and <strong>the</strong>y target and control mobile<br />
endogenous genetic elements primarily <strong>in</strong> germ cells. To date,<br />
piRNA complexes have been exclusively associated with target<strong>in</strong>g<br />
RNAs but this may reflect <strong>the</strong> fact that retrotransposons<br />
predom<strong>in</strong>ate <strong>in</strong> those germ cells under study.<br />
No homologous prote<strong>in</strong>s have been detected from sequence<br />
analyses between prote<strong>in</strong>s of <strong>the</strong> eukaryl siRNA <strong>system</strong>s and<br />
those of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>, although similarities may appear<br />
at a tertiary structural level. Moreover, despite Argonaute<br />
Piwi-like doma<strong>in</strong> prote<strong>in</strong>s occurr<strong>in</strong>g <strong>in</strong> many archaea and<br />
bacteria (Cerutti et al., 2000), <strong>the</strong>y have not been implicated <strong>in</strong><br />
crRNA-target<strong>in</strong>g. There is also very limited evidence for<br />
a functional target<strong>in</strong>g overlap between <strong>the</strong> two <strong>system</strong>s. A few<br />
sequence matches have been observed between archaeal and<br />
bacterial <strong>CRISPR</strong> spacers and transposons, consistent with <strong>the</strong><br />
<strong>CRISPR</strong>/Cas <strong>system</strong> target<strong>in</strong>g mobile elements (Lillestøl,<br />
et al., 2006; Held and Whitaker, 2009; Mojica et al., 2009;<br />
Shah et al., 2009). However, those reported have generally<br />
been carried on virus or plasmid genomes <strong>in</strong>clud<strong>in</strong>g, for<br />
example, spacer matches to each of <strong>the</strong> four transposase genes<br />
carried by <strong>the</strong> bicaudavirus ATV (Shah et al., 2009), but <strong>the</strong>se<br />
transposase genes/IS elements are presumably <strong>in</strong>dist<strong>in</strong>guishable<br />
from any o<strong>the</strong>r viral/plasmid genomic target if <strong>the</strong>y carry<br />
appropriate sequence motifs adjacent to protospacer sites.<br />
Moreover, <strong>in</strong> <strong>the</strong> archaeon S. solfataricus P2, which carries<br />
about 350 putative mobile elements (Brügger et al., 2002),<br />
<strong>the</strong>re is evidence that chromosomal transpositional activity is<br />
regulated, at least partly, by antisense RNAs (Tang et al.,<br />
35
36 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
2005), and very few close sequence matches were found to any<br />
of <strong>the</strong> 417 <strong>CRISPR</strong> spacers (Lillestøl et al., 2006; Shah et al.,<br />
2009).<br />
F<strong>in</strong>ally, <strong>the</strong> piRNA <strong>system</strong>, like <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />
<strong>CRISPR</strong>/Cmr <strong>system</strong>s, may be very ancient. Evolution of<br />
genomic parasites occurred concurrently with <strong>the</strong> emergence<br />
of self replicat<strong>in</strong>g genomes. Thus, <strong>the</strong> development of adaptive<br />
and heritable <strong>system</strong>s would be important for ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />
fitness.<br />
4. Conclusion<br />
The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>eries<br />
provide an effective defence mechanism <strong>in</strong> most archaea and<br />
some bacteria. The <strong>system</strong> is dynamic and hereditable,<br />
although <strong>the</strong> benefit for <strong>the</strong> cell <strong>in</strong> evolutionary terms is<br />
transitional because DNA from extrachromosomal elements<br />
taken up as spacers <strong>in</strong> <strong>CRISPR</strong> loci has a rapid turnover and is<br />
lost aga<strong>in</strong> via recomb<strong>in</strong>ation at repeats and/or transpositional<br />
events. Current evidence suggests that <strong>CRISPR</strong>/Cas and Cmr<br />
modules behave like <strong>in</strong>tegral genetic elements. They tend to be<br />
located <strong>in</strong> <strong>the</strong> most variable regions of chromosomes and are<br />
frequently displaced as a result of genome shuffl<strong>in</strong>g, <strong>in</strong>clud<strong>in</strong>g<br />
possibly transposition of whole modules. <strong>CRISPR</strong> loci may be<br />
broken up and dispersed <strong>in</strong> chromosomes with <strong>the</strong> potential for<br />
creat<strong>in</strong>g genetic novelty. Small leaderless <strong>CRISPR</strong>-like loci<br />
are commonly found <strong>in</strong> chromosomes and <strong>in</strong> plasmids, and<br />
some can be transcribed, but it rema<strong>in</strong>s unclear whe<strong>the</strong>r <strong>the</strong>y<br />
derive from <strong>CRISPR</strong> loci or whe<strong>the</strong>r <strong>the</strong>y have o<strong>the</strong>r orig<strong>in</strong>s<br />
and/or o<strong>the</strong>r functions. The <strong>CRISPR</strong>/Cas and Cmr modules<br />
appear to exchange readily between closely related organisms<br />
where <strong>the</strong>y may be subjected to strong selective pressure. It is<br />
likely that this can occur via conjugative plasmids or chromosomal<br />
conjugation. While universal phylogenetic trees for<br />
Cas1/Cas3 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/Cas module and Cmr2 of<br />
<strong>the</strong> Cmr module suggest that <strong>in</strong>terdoma<strong>in</strong> transfers between<br />
archaea and bacteria have occurred, <strong>the</strong> relatively large<br />
number of archaea-specific Cas/Cmr prote<strong>in</strong>s suggests that<br />
<strong>the</strong>se may have been very rare events, consistent with <strong>the</strong><br />
<strong>in</strong>compatibility of <strong>the</strong> transcription, translation and conjugative<br />
<strong>system</strong>s.<br />
There are parallels to <strong>the</strong> eukaryal siRNAs, most notably<br />
for <strong>the</strong> germ cell piRNAs, which are also directed by effector<br />
prote<strong>in</strong>s to silence or destroy <strong>in</strong>vad<strong>in</strong>g foreign DNA and<br />
transposons. While some common effector prote<strong>in</strong>s are<br />
utilized <strong>in</strong> different eukaryal siRNA <strong>system</strong>s, no homologous<br />
prote<strong>in</strong>s are identifiable between <strong>the</strong> eukaryal siRNA prote<strong>in</strong>s<br />
and those of <strong>the</strong> archaeal and bacterial <strong>CRISPR</strong>/Cas and Cmr<br />
modules. Possibly very distant phylogenetic relationships will<br />
appear as more crystal structures of <strong>the</strong> siRNA and crRNA<br />
effector prote<strong>in</strong>s are determ<strong>in</strong>ed.<br />
Acknowledgements<br />
Research at <strong>the</strong> <strong>Archaea</strong> Centre was supported by grants<br />
from <strong>the</strong> Danish Natural Science Research Council and <strong>the</strong><br />
Danish Foundation for Basic Research. We appreciate helpful<br />
discussions with Qunx<strong>in</strong> She, Soley Gudbergsdottir, L<strong>in</strong>g<br />
Deng, Guo Li and Xu Peng.<br />
References<br />
Aagaard, C., Dalgaard, J., Garrett, R.A., 1995. Inter-cellular mobility and<br />
hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron confers selective advantage over<br />
<strong>in</strong>tron-cells of Sulfolobus acidocaldarius. Proc. Natl. Acad. Sci. U.S.A. 92,<br />
12285e12289.<br />
Arav<strong>in</strong>, A.A., Hannon, G.J., Brennecke, J., 2008. The Piwi-piRNA pathway<br />
provides an adaptive defense <strong>in</strong> <strong>the</strong> transposon arms race. Science 318,<br />
761e764.<br />
Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />
Mo<strong>in</strong>eau, S., Romero, D.A., Horvath, P., 2007. <strong>CRISPR</strong> provides acquired<br />
resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315, 1709e1712.<br />
Basta, T., Smyth, J., Forterre, P., Prangishvili, D., Peng, X., 2009. Novel<br />
archaeal plasmid pAH1 and its <strong>in</strong>teraction with <strong>the</strong> lipothrixvirus AFV1.<br />
Mol. Microbiol. 71, 23e34.<br />
Bize, A., Karlsson, E.A., Ekefjärd, K., Quax, T.E., P<strong>in</strong>a, M., Prevost, M.C.,<br />
Forterre, P., Tenaillon, O., Bernander, R., Prangishvili, D., 2009. A unique<br />
virus release mechanism <strong>in</strong> <strong>the</strong> <strong>Archaea</strong>. Proc. Natl. Acad. Sci. U.S.A. 106,<br />
11306e11311.<br />
Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K., Kyrpides, N.C.,<br />
Hugenholtz, P., 2007. <strong>CRISPR</strong> Recognition Tool (CRT): a tool for automatic<br />
detection of clustered regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats.<br />
BMC Bio<strong>in</strong>form. 8, 209.<br />
Bolot<strong>in</strong>, A., Qu<strong>in</strong>quis, B., Sorok<strong>in</strong>, A., Ehrlich, S.D., 2005. Clustered regularly<br />
<strong>in</strong>terspaced short pal<strong>in</strong>drome repeats (<strong>CRISPR</strong>s) have spacers of extrachromosomal<br />
orig<strong>in</strong>. Microbiology 151, 2551e2561.<br />
Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R., Slijkhuis, R.J.,<br />
Snijders, A.P., Dickman, M.J., Makarova, K.S., Koon<strong>in</strong>, E.V., van der<br />
Oost, J., 2008. Small <strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes.<br />
Science 321, 960e964.<br />
Brügger, K., Redder, P., She, Q., Confalonieri, F., Zivanovic, Y., Garrett, R.A., 2002.<br />
Mobile elements <strong>in</strong> archaeal genomes. FEMS Microbiol. Lett. 206, 131e141.<br />
Brügger, K., Torar<strong>in</strong>sson, E., Chen, L., Garrett, R.A., 2004. Shuffl<strong>in</strong>g of<br />
Sulfolobus genomes by autonomous and non-autonomous mobile<br />
elements. Biochem. Soc. Trans. 32, 179e183.<br />
Carte, J., Wang, R., Li, H., Terns, R.M., Terns, M.P., 2008. Cas6 is an<br />
endoribonuclease that generates guide RNAs for <strong>in</strong>vader defense <strong>in</strong><br />
prokaryotes. Genes Dev. 22, 3489e3496.<br />
Cerutti, L., Mian, N., Bateman, A., 2000. Doma<strong>in</strong>s <strong>in</strong> gene silenc<strong>in</strong>g and cell<br />
differentiation prote<strong>in</strong>s: <strong>the</strong> novel PAZ doma<strong>in</strong> and redef<strong>in</strong>ition of <strong>the</strong> Piwi<br />
doma<strong>in</strong>. Trends Biochem. Sci. 25, 481e482.<br />
Chen, L., Brügger, K., Skovgaard, M., Redder, P., She, Q., Torar<strong>in</strong>sson, E.,<br />
Greve, B., Awayez, M., Zibat, A., Klenk, H.F., Garrett, R.A., 2005. The<br />
genome of Sulfolobus acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota.<br />
J. Bacteriol. 187, 4992e4999.<br />
DeBoy, R.T., Mongod<strong>in</strong>, E.F., Emerson, J.B., Nelson, K.E., 2006. Chromosome<br />
evolution <strong>in</strong> <strong>the</strong> Thermotogales: large-scale <strong>in</strong>versions and stra<strong>in</strong><br />
diversification of <strong>CRISPR</strong> sequences. J. Bacteriol. 188, 2364e2374.<br />
Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J., Fremaux, C.,<br />
Boyaval, P., Romero, D.A., Horvath, P., Mo<strong>in</strong>eau, S., 2008. Phage response<br />
to <strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus. J. Bacteriol.<br />
190, 1390e1400.<br />
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy<br />
and high throughput. Nucleic Acids Res. 32, 1792e1797.<br />
Enright, A.J., Van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm<br />
for large-scale detection of prote<strong>in</strong> families. Nucleic Acids Res. 30,<br />
1575e1584.<br />
Garrett, R.A., Prangishvili, D., Shah, S.A., Reuter, M., Stetter, K., Peng, X.,<br />
2010. Metagenomic analyses of novel viruses, plasmids, and <strong>the</strong>ir variants,<br />
from an environmental sample of hyper<strong>the</strong>rmophilic neutrophiles cultured <strong>in</strong><br />
a bioreactor. Environ. Microbiol., doi:10.1111/j.1462-2920.2010.02266.x.<br />
Godde, J.S., Bickerton, A., 2006. The repetitive DNA elements called<br />
<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />
among prokaryotes. J. Mol. Evol. 62, 718e729.
Goldovsky, L., Cases, I., Enright, A.J., Ouzounis, C.A., 2005. BioLayout<br />
(Java): versatile network visualisation of structural and functional relationships.<br />
Appl. Bio<strong>in</strong>form. 4, 71e74.<br />
Greve, B., Jensen, S., Brügger, K., Zillig, W., Garrett, R.A., 2004. Genomic<br />
comparison of archaeal conjugative plasmids from Sulfolobus. <strong>Archaea</strong> 1,<br />
231e239.<br />
Grissa, I., Vergnaud, G., Pourcel, C., 2008. <strong>CRISPR</strong>compar: a website to<br />
compare clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats. Nucleic<br />
Acids Res. 36, 145e148.<br />
Grogan, D.W., 1996. Exchange of genetic markers at extremely high<br />
temperatures <strong>in</strong> <strong>the</strong> archaeaon Sulfolobus acidocaldarius. J. Bacteriol. 178,<br />
3207e3211.<br />
Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F., Nelson, K.E., 2005. A guild of<br />
45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/<br />
Cas subtypes exist <strong>in</strong> prokaryotic genomes. PloS Comput. Biol. 1,<br />
474e483.<br />
Hale, C., Kleppe, K., Terns, R.M., Terns, M.P., 2008. Prokaryotic silenc<strong>in</strong>g<br />
(psi)RNAs <strong>in</strong> Pyrococcus furiosus. RNA 14, 1e8.<br />
Hale, C.R., Zhao, P., Olson, S., Duff, M.O., Graveley, B.R., Wells, L.,<br />
Terns, R.M., Terns, M.P., 2009. RNA-guided RNA cleavage by a <strong>CRISPR</strong><br />
RNA-Cas prote<strong>in</strong> complex. Cell 139, 945e956.<br />
Hannon, G.J., 2002. RNA <strong>in</strong>terference. Nature 418, 244e251.<br />
Held, N.L., Whitaker, R.J., 2009. Viral biogeography revealed by signatures <strong>in</strong><br />
Sulfolobus islandicus genomes. Environ. Microbiol. 11, 457e466.<br />
Horvath, P., Romero, D.A., Coûté-Monvois<strong>in</strong>, A.-C., Richards, M., Deveau, H.<br />
, Mo<strong>in</strong>eau, S., Boyaval, P., Fremaux, C., Barrangou, R., 2008. Diversity,<br />
activity, and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />
J. Bacteriol. 190, 1401e1412.<br />
Horvath, P., Coûté-Monvois<strong>in</strong>, A.-C., Romero, D.A., Boyaval, P., Fremaux, C.,<br />
Barrangou, R., 2009. Comparative analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid<br />
bacteria genomes. Int. J. Food Microbiol. 131, 62e70.<br />
Jansen, R., Embden, J.D., Gaastra, W., Schouls, L.M., 2002. Identification of<br />
genes that are associated with DNA repeats <strong>in</strong> prokaryotes. Mol. Microbiol.<br />
43, 1565e1575.<br />
J<strong>in</strong>ek, M., Doudna, J.A., 2009. A three dimensional view of <strong>the</strong> molecular<br />
mach<strong>in</strong>ery of RNA <strong>in</strong>terference. Nature 457, 405e412.<br />
Karg<strong>in</strong>ov, F.V., Hannon, G.J., 2010. The <strong>CRISPR</strong> <strong>system</strong>: small RNA-guided<br />
defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7e19.<br />
Kawarabayashi, Y., H<strong>in</strong>o, Y., Horikawa, H., J<strong>in</strong>-no, K., Takahashi, M.,<br />
Sek<strong>in</strong>e, M., Baba, S., Ankai, A., Kosugi, H., Hosoyama, A., Fukui, S.,<br />
Nagai, Y., Nishijima, K., Otsuka, R., Nakazawa, H., Takamiya, M.,<br />
Kato, Y., Yoshizawa, T., Tanaka, T., Kudoh, Y., Yamazaki, J., Kushida, N.,<br />
Oguchi, A., Aoki, K., Masuda, S., Yanagii, M., Nishimura, M.,<br />
Yamagishi, A., Oshima, T., Kikuchi, H., 2001. Complete genome sequence<br />
of an aerobic <strong>the</strong>rmoacidophilic crenarchaeon, Sulfolobus tokodaii stra<strong>in</strong>7.<br />
DNA Res. 8, 123e140.<br />
Klattenhoff, C., Theurkauf, W., 2008. Biogenesis and germl<strong>in</strong>e functions of<br />
piRNAs. Development 135, 3e9.<br />
Kun<strong>in</strong>, V., Sorek, R., Hugenholtz, P., 2007. Evolutionary conservation of<br />
sequence and secondary structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol. 8,<br />
R611eR617.<br />
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M.,<br />
Antonescu, C., Salzberg, S.L., 2004. Versatile and open software for<br />
compar<strong>in</strong>g large genomes. Genome Biol. 5, R12.<br />
Lillestøl, R.K., Redder, P., Garrett, R.A., Brügger, K., 2006. A putative viral<br />
defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59e72.<br />
Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H., Christiansen, J.,<br />
Garrett, R.A., 2009. <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genius Sulfolobus:<br />
bidirectional transcription and dynamic properties. Mol. Microbiol.<br />
72, 259e272.<br />
Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I., Koon<strong>in</strong>, E.V., 2006.<br />
A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes:<br />
computational analysis of <strong>the</strong> predicted enzymatic mach<strong>in</strong>ery, functional<br />
analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms of action.<br />
Biol. Direct 1, 7.<br />
Marraff<strong>in</strong>i, L.A., Son<strong>the</strong>imer, E.J., 2008. <strong>CRISPR</strong> <strong>in</strong>terference limits horizontal<br />
gene transfer <strong>in</strong> Staphylococci by target<strong>in</strong>g DNA. Science 322,<br />
1843e1845.<br />
S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
Marraff<strong>in</strong>i, L.A., Son<strong>the</strong>imer, E.J., 2010. Self versus non-self discrim<strong>in</strong>ation<br />
dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463, 568e571.<br />
Mochizuki, K., Gorovsky, M.A., 2004. Small RNAs <strong>in</strong> genome rearrangements<br />
<strong>in</strong> Tetrahymena. Curr. Op<strong>in</strong>. Genet. Dev. 14, 181e187.<br />
Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., Soria, E., 2005. Interven<strong>in</strong>g<br />
sequences of regularly spaced prokaryotic repeats derive from<br />
foreign genetic elements. J. Mol. Evol. 60, 174e182.<br />
Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J., Almendros, C., 2009.<br />
Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic <strong>CRISPR</strong><br />
<strong>system</strong>. Microbiology 155, 733e740.<br />
Muskhelishvili, G., Palm, P., Zillig, W., 1993. SSV1-encoded site-specific<br />
recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol. Gen. Genet. 237,<br />
334e342.<br />
Nelson, K.E., Clayton, E., Gill, S.R., Gw<strong>in</strong>n, M.L., Dodson, R.J., Haft, D.H.,<br />
Hickey, E.K., Peterson, J.D., Nelson, W.C., Ketchum, K.A., et al., 1999.<br />
Evidence for lateral gene transfer between archaea and bacteria from<br />
genome sequence of Thermotoga maritima. Nature 399, 323e329.<br />
Pearson, W.R., 2000. Flexible sequence similarity search<strong>in</strong>g with <strong>the</strong> FASTA3<br />
program package. Methods Mol. Biol. 132, 185e219.<br />
Peng, X., Brügger, K., Shen, B., Chen, L., She, Q., Garrett, R.A., 2003. Genusspecific<br />
prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> large clusters of DNA repeats (short regularly<br />
spaced repeats) present <strong>in</strong> Sulfolobus genomes. J. Bacteriol. 185,<br />
2410e2417.<br />
Peng, X., Kessler, A., Phan, H., Garrett, R.A., Prangishvili, D., 2004. Multiple<br />
variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel<br />
mechanism of genomic variation. Mol. Microbiol. 54, 366e375.<br />
Portillo, M.C., Gonzalez, J.M., 2009. <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong> <strong>the</strong>rmococcales:<br />
evidence for associated horizontal gene transfer <strong>in</strong> Pyrococcus furiosus. J.<br />
Appl. Genet. 50, 421e430.<br />
Pourcel, C., Salvignol, G., Vergnaud, G., 2005. <strong>CRISPR</strong> elements <strong>in</strong> Yers<strong>in</strong>ia<br />
pestis acquire new repeats by preferential uptake of bacteriophage DNA,<br />
and provide additional tools for evolutionary studies. Microbiology 151,<br />
653e663.<br />
Prangishvili, D., Forterre, P., Garrett, R.A., 2006. Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />
a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837e848.<br />
Pühl, Ü., Wurm, R., Arslan, Z., Geissen, R., Hofmann, N., Wagner, R., 2010.<br />
Identification and characterisation of E. coli <strong>CRISPR</strong>-cas promoters and<br />
<strong>the</strong>ir silenc<strong>in</strong>g by H-NS. Mol. Microbiol. 75, 1495e1512.<br />
Redder, P., Garrett, R.A., 2006. Mutations and rearrangements <strong>in</strong> <strong>the</strong> genome<br />
of Sulfolobus solfataricus P2. J. Bacteriol. 188, 4198e4206.<br />
Reno, M.L., Hel, N.L., Fields, C.J., Burke, P.V., Whitaker, R.J., 2009.<br />
Biogeography of <strong>the</strong> Sulfolobus islandicus pan-genome. Proc. Natl. Acad.<br />
Sci. U.S.A. 106, 8605e8610.<br />
Santangelo, T.J., Cubonová, L., Sk<strong>in</strong>ner, K.M., Reeve, J.N., 2009. <strong>Archaea</strong>l<br />
<strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191, 7102e7108.<br />
Sebaihia, M., Wren, B.W., Mullany, P., Fairwea<strong>the</strong>r, N.F., M<strong>in</strong>ton, N., Stabler, R.<br />
, Thomson, N.R., Roberts, A.P., Cerdeño-Tárraga, A.M., Wang, H., et al.,<br />
2006. The multidrug resistant human pathogen Clostridium difficile has<br />
a highly mobile mosaic genome. Nat. Genet. 38, 779e786.<br />
Shah, S.A., Hansen, N.R., Garrett, R.A., 2009. Distributions of <strong>CRISPR</strong> spacer<br />
matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles and<br />
implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem. Soc. Trans. 37,<br />
23e28.<br />
She, Q., Phan, H., Garrett, R.A., Albers, S.-V., Stedman, K.M., Zillig, W.,<br />
1998. Genetic profile of pNOB8 from Sulfolobus: <strong>the</strong> first conjugative<br />
plasmid from an archaeon. Extremophiles 2, 417e425.<br />
She, Q., S<strong>in</strong>gh, R.K., Confalonieri, F., Zivanovic, Y., Gordon, P., Allard, G.,<br />
Awayez, M.J., Chan-Weiher, C.C., Clausen, I.G., Curtis, B.A., et al.,<br />
2001a. The complete genome of <strong>the</strong> crenarchaeon Sulfolobus solfataricus<br />
P2. Proc. Natl. Acad. Sci. U.S.A. 98, 7835e7840.<br />
She, Q., Peng, X., Zillig, W., Garrett, R.A., 2001b. Gene capture events <strong>in</strong><br />
archaeal chromosomes. Nature 409, 478.<br />
Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>, M.-L., Huber, H.,<br />
Drungowski, M., Elge, T., Brosius, J., Hüttenhofer, A., 2002. Identification<br />
of 86 candidates for small non-messenger RNAs from <strong>the</strong> archaeon<br />
Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99, 7536e7541.<br />
Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K., Garrett, R.,<br />
Bachellerie, J.P., Hüttenhofer, A., et al., 2005. Identification of novel non-<br />
37
38 S.A. Shah, R.A. Garrett / Research <strong>in</strong> Microbiology 162 (2011) 27e38<br />
cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />
solfataricus. Mol. Microbiol. 55, 469e481.<br />
Thompson, J.D., Higg<strong>in</strong>s, D.G., Gibson, T.J., 1994. CLUSTAL W: improv<strong>in</strong>g<br />
<strong>the</strong> sensitivity of progressive multiple sequence alignment through<br />
sequence weight<strong>in</strong>g, position-specific gap penalties and weight matrix<br />
choice. Nucleic Acids Res. 22, 4673e4680.<br />
Torar<strong>in</strong>sson, E., Klenk, H.P., Garrett, R.A., 2005. Divergent transcriptional and<br />
translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7, 47e54.<br />
Tyson, G.W., Banfield, J.F., 2007. Rapidly evolv<strong>in</strong>g <strong>CRISPR</strong>s implicated <strong>in</strong> acquired<br />
resistance of microorganisms to viruses. Environ. Microbiol. 10, 200e208.<br />
Van Embden, J.D.A., Van Gorkom, T., Kremer, K., Jansen, R., Van Der<br />
Zeijst, B.A.M., Schouls, L.M., 2000. Genetic variation and evolutionary<br />
orig<strong>in</strong> of <strong>the</strong> direct repeat locus of Mycobacterium tuberculosis complex<br />
bacteria. J. Bacteriol. 182, 2393e2401.<br />
Veith, A., Kl<strong>in</strong>gl, A., Zolghadr, B., Lauber, K., Mentele, R., Lottspeich, F.,<br />
Rachel, R., Albers, S.V., Kletz<strong>in</strong>, A., 2009. Acidianus, Sulfolobus and<br />
Metallosphaera surface layers: structure, composition and gene expression.<br />
Mol. Microbiol. 73, 58e72.<br />
Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />
Briegel, A., Rachel, R., Garrett, R.A., Prangishvili, D., 2008. SRV, a new<br />
rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses<br />
with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />
6837e6845.<br />
Zhaxybayeva, O., Swi<strong>the</strong>rs, K.S., Lapierre, P., Fournier, G.P., Bickhart, D.M.,<br />
DeBoy, R.T., Nelson, K.E., Nesbø, C.L., Doolittle, W.F., Gogarten, J.P.,<br />
Noll, K.M., 2009. On <strong>the</strong> chimeric nature, <strong>the</strong>rmophilic orig<strong>in</strong>, and phylogenetic<br />
placement of <strong>the</strong> Thermotogales. Proc. Natl. Acad. Sci. U.S.A. 106,<br />
5865e5870.
JOURNAL OF BACTERIOLOGY, Apr. 2011, p. 1672–1680 Vol. 193, No. 7<br />
0021-9193/11/$12.00 doi:10.1128/JB.01487-10<br />
Copyright © 2011, American Society for Microbiology. All Rights Reserved.<br />
Genome Analyses of Icelandic Stra<strong>in</strong>s of Sulfolobus islandicus, Model<br />
Organisms for Genetic and Virus-Host Interaction Studies <br />
Li Guo, 1 † Kim Brügger, 2 † Chao Liu, 2 † Shiraz A. Shah, 2 † Huajun Zheng, 3 Yongqiang Zhu, 3<br />
Shengyue Wang, 3 Reidun K. Lillestøl, 2 Lanm<strong>in</strong>g Chen, 2 Jeremy Frank, 2 David Prangishvili, 4<br />
Lars Paul<strong>in</strong>, 5 Qunx<strong>in</strong> She, 2 ‡ Li Huang, 1 ‡* and Roger A. Garrett 2 ‡*<br />
State Key Laboratory of Microbial Resources, Institute of Microbiology, Ch<strong>in</strong>ese Academy of Sciences, No. 1 West Beichen Road,<br />
Chaoyang District, Beij<strong>in</strong>g 100101, Ch<strong>in</strong>a 1 ; <strong>Archaea</strong> Centre, Department of Biology, Copenhagen University, Ole Maaløes Vej 5,<br />
DK-2200N Copenhagen, Denmark 2 ;ShanghaiMOSTKeyLaboratoryofDiseaseandHealthGenomics,<br />
Ch<strong>in</strong>ese National Human Genome Center at Shanghai, Shanghai 201203, Ch<strong>in</strong>a 3 ; Molecular Biology of<br />
<strong>the</strong> Gene <strong>in</strong> Extremophiles Unit, Institut Pasteur, rue Dr Roux 25, 75724 Paris Cedex, France 4 ; and<br />
DNA Sequenc<strong>in</strong>g and Genomics Laboratory, Institute of Biotechnology, University of<br />
Hels<strong>in</strong>ki, 00790 Hels<strong>in</strong>ki, F<strong>in</strong>land 5<br />
Received 10 December 2010/Accepted 16 January 2011<br />
The genomes of two Sulfolobus islandicus stra<strong>in</strong>s obta<strong>in</strong>ed from Icelandic solfataras were sequenced and<br />
analyzed. Stra<strong>in</strong> REY15A is a host for a versatile genetic toolbox. It exhibits a genome of m<strong>in</strong>imal size, is stable<br />
genetically, and is easy to grow and manipulate. Stra<strong>in</strong> HVE10/4 shows a broad host range for exceptional<br />
crenarchaeal viruses and conjugative plasmids and was selected for study<strong>in</strong>g <strong>the</strong>ir life cycles and host<br />
<strong>in</strong>teractions. The genomes of stra<strong>in</strong>s REY15A and HVE10/4 are 2.5 and 2.7 Mb, respectively, and each genome<br />
carries a variable region of 0.5 to 0.7 Mb where major differences <strong>in</strong> gene content and gene order occur. These<br />
<strong>in</strong>clude gene clusters <strong>in</strong>volved <strong>in</strong> specific metabolic pathways, multiple copies of VapBC antitox<strong>in</strong>-tox<strong>in</strong> gene<br />
pairs, and <strong>in</strong> stra<strong>in</strong> HVE10/4, a 50-kb region rich <strong>in</strong> glycosyl transferase genes. The variable region also<br />
conta<strong>in</strong>s most of <strong>the</strong> <strong>in</strong>sertion sequence (IS) elements and high proportions of <strong>the</strong> orphan orfB elements and<br />
SMN1 m<strong>in</strong>iature <strong>in</strong>verted-repeat transposable elements (MITEs), as well as <strong>the</strong> clustered regular <strong>in</strong>terspaced<br />
short pal<strong>in</strong>dromic repeat (<strong>CRISPR</strong>)-based <strong>immune</strong> <strong>system</strong>s, which are complex and diverse <strong>in</strong> both stra<strong>in</strong>s,<br />
consistent with <strong>the</strong>m hav<strong>in</strong>g been mobilized both <strong>in</strong>tra- and <strong>in</strong>tercellularly. In contrast, <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong><br />
genomes are highly conserved <strong>in</strong> <strong>the</strong>ir prote<strong>in</strong> and RNA gene syntenies, closely resembl<strong>in</strong>g those of o<strong>the</strong>r S.<br />
islandicus and Sulfolobus solfataricus stra<strong>in</strong>s, and <strong>the</strong>y exhibit only m<strong>in</strong>or remnants of a few genetic elements,<br />
ma<strong>in</strong>ly conjugative plasmids, which have <strong>in</strong>tegrated at a few tRNA genes lack<strong>in</strong>g <strong>in</strong>trons. This provides a<br />
possible rationale for <strong>the</strong> presence of <strong>the</strong> <strong>in</strong>trons.<br />
Iceland has been a rich source of hyper<strong>the</strong>rmophilic crenarchaea<br />
over <strong>the</strong> past 3 decades and especially of acido<strong>the</strong>rmophilic<br />
members of <strong>the</strong> order Sulfolobales. Many Sulfolobus islandicus<br />
stra<strong>in</strong>s (“Island” is German for “Iceland”) have also<br />
yielded many novel viruses show<strong>in</strong>g varied and sometimes<br />
unique morphologies and exceptional genome contents. These<br />
properties are consistent with <strong>the</strong>se viruses constitut<strong>in</strong>g an<br />
archaeal l<strong>in</strong>eage dist<strong>in</strong>ct from those of bacteria and eukarya,<br />
and <strong>the</strong>y have now been classified <strong>in</strong>to several new viral families<br />
(38, 63). In addition, a family of conjugative plasmids has<br />
been characterized, with most members deriv<strong>in</strong>g from Iceland,<br />
which appear to conjugate by a mechanism unique to <strong>the</strong><br />
archaeal doma<strong>in</strong> (18, 37).<br />
Although <strong>the</strong> availability of genome sequences of Sulfolobus<br />
* Correspond<strong>in</strong>g author. Mail<strong>in</strong>g address for R. A. Garrett: <strong>Archaea</strong><br />
Centre, Department of Biology, Copenhagen University, Ole Maaløes<br />
Vej 5, DK-2200N Copenhagen, Denmark. Phone: 045-353-22010. Fax:<br />
045-353-22128. E-mail: garrett@bio.ku.dk. Mail<strong>in</strong>g address for L.<br />
Huang: State Key Laboratory of Microbial Resources, Institute of<br />
Microbiology, Ch<strong>in</strong>ese Academy of Sciences, No. 1 West Beichen<br />
Road, Chaoyang District, Beij<strong>in</strong>g 100101, Ch<strong>in</strong>a. Phone: 086-10-<br />
64807430. Fax: 086-10-64807429. E-mail: huangl@sun.im.ac.cn.<br />
† These authors contributed equally.<br />
‡ The last three authors are jo<strong>in</strong>t senior authors.<br />
Published ahead of pr<strong>in</strong>t on 28 January 2011.<br />
1672<br />
stra<strong>in</strong>s and <strong>the</strong>ir genetic elements has yielded important <strong>in</strong>sights<br />
<strong>in</strong>to <strong>the</strong> biology of <strong>the</strong>se model crenarchaea, a major<br />
impediment to more detailed <strong>in</strong>sights has been <strong>the</strong> paucity of<br />
robust and versatile vector-host <strong>system</strong>s for genetic studies. A<br />
few Sulfolobus species have been successfully employed as<br />
hosts for such <strong>system</strong>s, <strong>in</strong>clud<strong>in</strong>g Sulfolobus solfataricus stra<strong>in</strong>s<br />
P1 and 98/2 (22, 58), Sulfolobus acidocaldarius (57), and S.<br />
islandicus stra<strong>in</strong> REY15A (54). To date, <strong>the</strong> genetic tools developed<br />
for <strong>the</strong> latter host are <strong>the</strong> most versatile and <strong>in</strong>clude<br />
<strong>the</strong> follow<strong>in</strong>g: (i) Sulfolobus-Escherichia coli shuttle vectors<br />
carry<strong>in</strong>g ei<strong>the</strong>r viral or plasmid replication orig<strong>in</strong>s (50); (ii)<br />
conventional and novel gene knockout methodologies (14, 62),<br />
and (iii) a D-arab<strong>in</strong>ose-<strong>in</strong>ducible expression <strong>system</strong> with a lacS<br />
reporter gene <strong>system</strong> (35). The S. islandicus <strong>system</strong> has also<br />
been employed successfully to demonstrate <strong>the</strong> dynamic character<br />
of <strong>the</strong> clustered regular <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />
repeat (<strong>CRISPR</strong>)-based <strong>immune</strong> <strong>system</strong>s of Sulfolobus when<br />
challenged with genetic elements carry<strong>in</strong>g match<strong>in</strong>g viral gene<br />
and protospacers ma<strong>in</strong>ta<strong>in</strong>ed under selection (20). These developments<br />
necessitated <strong>the</strong> determ<strong>in</strong>ation of <strong>the</strong> genome sequence<br />
of S. islandicus stra<strong>in</strong> REY15A as a prerequisite for<br />
successful exploitation of <strong>the</strong> genetic <strong>system</strong>s.<br />
A second Icelandic stra<strong>in</strong>, S. islandicus stra<strong>in</strong> HVE10/4, has<br />
been employed as a broad laboratory host for propagat<strong>in</strong>g<br />
diverse Sulfolobus viruses and conjugative plasmids (63) and
VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1673<br />
FIG. 1. (A) Dot plot of <strong>the</strong> two Icelandic genomes show<strong>in</strong>g <strong>the</strong> approximate levels of sequence synteny. The large variable regions extend from<br />
about 0.35 to 1.0 Mb. Transposase genes are denoted by black l<strong>in</strong>es along <strong>the</strong> axes. Putative orig<strong>in</strong>s of replication adjacent to <strong>the</strong> cdc6 and whiP<br />
genes are <strong>in</strong>dicated with red circles, while <strong>the</strong> families of <strong>the</strong> <strong>CRISPR</strong>/Cas (I and III) and Cmr (B) modules are <strong>in</strong>dicated by blue squares. (B) Dot<br />
plot of <strong>the</strong> S. islandicus REY15A and S. solfataricus P2 genomes.<br />
was selected for <strong>in</strong>-depth studies of <strong>the</strong>ir life cycles and host<br />
<strong>in</strong>teractions. This effort received added impetus with <strong>the</strong> demonstration<br />
that some genetic elements show exceptional and<br />
sometimes unique properties of <strong>the</strong>ir viral life cycles or conjugative<br />
mechanisms (3, 8, 18, 40). Therefore, <strong>the</strong> genome<br />
sequence of S. islandicus stra<strong>in</strong> HVE10/4 was also determ<strong>in</strong>ed.<br />
The genome sequences of two Icelandic stra<strong>in</strong>s, REY15A<br />
and HVE10/4, were analyzed and compared and contrasted<br />
with one ano<strong>the</strong>r and with genomes of o<strong>the</strong>r S. solfataricus and<br />
S. islandicus stra<strong>in</strong>s isolated from different geographical locations,<br />
<strong>in</strong>clud<strong>in</strong>g Naples, Italy; Kamchatka, Russia; Lassen Volcanic<br />
National Park; and Yellowstone National Park (44, 53).<br />
MATERIALS AND METHODS<br />
Genome sequenc<strong>in</strong>g. S. islandicus stra<strong>in</strong>s REY15A and HVE10/4 were colony<br />
purified three times and cultured essentially as described earlier (11). Total DNA<br />
was extracted from <strong>the</strong> cells us<strong>in</strong>g phenol-chloroform and fur<strong>the</strong>r purified by<br />
CsCl density-gradient centrifugation. For stra<strong>in</strong> REY15A, sequenc<strong>in</strong>g of shotgun<br />
libraries with a 454 GS FLX sequenator yielded 324,123 reads with 31-fold<br />
genome coverage. For stra<strong>in</strong> HVE10/4, DNA was sonicated to yield fragments <strong>in</strong><br />
<strong>the</strong> size range of 1.5 to 4.0 kb, and clone libraries were generated <strong>in</strong> pUC18 us<strong>in</strong>g<br />
<strong>the</strong> SmaI site. Sequenc<strong>in</strong>g was performed on MegaBace 1000 sequenators to<br />
yield approximately 3-fold sequence coverage, and <strong>the</strong> sequenc<strong>in</strong>g data were<br />
comb<strong>in</strong>ed with a sequenc<strong>in</strong>g run us<strong>in</strong>g a 454 FLX sequenator to yield approximately<br />
10- to 15-fold coverage. The genome sequences were assembled us<strong>in</strong>g <strong>the</strong><br />
phred/phrap/consed package, contigs were l<strong>in</strong>ked by comb<strong>in</strong>atorial PCR us<strong>in</strong>g<br />
primers match<strong>in</strong>g to each contig end, and <strong>the</strong> PCR products were sequenced to<br />
close <strong>the</strong> gaps. Rema<strong>in</strong><strong>in</strong>g ambiguous sequence regions <strong>in</strong> <strong>the</strong> genome were<br />
identified and resolved by generat<strong>in</strong>g and sequenc<strong>in</strong>g PCR products. Both genomes<br />
were annotated automatically and ref<strong>in</strong>ed manually.<br />
Sequence analyses. Open read<strong>in</strong>g frames (ORFs) were predicted with Glimmer<br />
(13). Frameshifts were detected and checked by sequenc<strong>in</strong>g after manual<br />
annotation, and <strong>the</strong> rema<strong>in</strong><strong>in</strong>g frameshifts were considered to be au<strong>the</strong>ntic.<br />
Functional assignments of ORFs are based on searches aga<strong>in</strong>st GenBank (http:<br />
//www.ncbi.nlm.nih.gov/) and <strong>the</strong> Conserved Doma<strong>in</strong> Database (CDD) (www<br />
.ncbi.nlm.nih.gov/cdd/). tRNA genes were located with tRNAscan-SE (26). Potential<br />
noncod<strong>in</strong>g RNAs were predicted by comparison with <strong>the</strong> untranslated<br />
RNAs characterized for S. solfataricus and S. acidocaldarius, <strong>in</strong> terms of sequence<br />
similarity and gene context (see Results). Putative <strong>in</strong>sertion sequence<br />
(IS) elements were identified by BLASTN search aga<strong>in</strong>st <strong>the</strong> IS F<strong>in</strong>der database<br />
(http://www-is.biotoul.fr/). All annotations were manually curated us<strong>in</strong>g Artemis<br />
software (47).<br />
RESULTS<br />
Genome general properties. Genomes of <strong>the</strong> two Icelandic<br />
stra<strong>in</strong>s were sequenced us<strong>in</strong>g a comb<strong>in</strong>ation of sequenc<strong>in</strong>g<br />
strategies. S. islandicus REY15A was determ<strong>in</strong>ed primarily by<br />
454 sequenc<strong>in</strong>g, while stra<strong>in</strong> HVE10/4 was obta<strong>in</strong>ed by a comb<strong>in</strong>ation<br />
of Sanger and 454 sequenc<strong>in</strong>g at approximately 30fold<br />
and 10-fold coverage, respectively. Prote<strong>in</strong>-cod<strong>in</strong>g genes<br />
were annotated <strong>in</strong> Artemis (47), where start codons for s<strong>in</strong>gle<br />
genes and first genes of Sulfolobus operons were generally<br />
located 25 to 30 bp downstream from <strong>the</strong> archaeal hexameric<br />
TATA-like box and only genes with<strong>in</strong> operons were preceded<br />
by Sh<strong>in</strong>e-Dalgarno motifs, of which GGUG predom<strong>in</strong>ates<br />
(56). Where alternative start codons were juxtapositioned, we<br />
selected <strong>the</strong> most probable on <strong>the</strong> basis of its position relative<br />
to <strong>the</strong> putative promoter and/or Sh<strong>in</strong>e-Dalgarno motifs or experimental<br />
data from closely related organisms.<br />
Dot plots of <strong>the</strong> two genomes demonstrate long sections of<br />
gene synteny. One region of about 0.5 to 0.7 Mb exhibits<br />
extensive gene shuffl<strong>in</strong>g, and <strong>the</strong>re is a smaller region with a<br />
200-kb <strong>in</strong>version bordered by shuffled genes (Fig. 1A). Some of<br />
<strong>the</strong> m<strong>in</strong>or irregularities <strong>in</strong> <strong>the</strong> dot plot were attributable to<br />
<strong>in</strong>sertion or <strong>in</strong>tegration events. The synteny is ma<strong>in</strong>ta<strong>in</strong>ed, to a<br />
large degree, when each genome is compared to that of S.<br />
solfataricus P2, despite <strong>the</strong> occurrence of a large <strong>in</strong>version <strong>in</strong><br />
<strong>the</strong> latter, and this is illustrated <strong>in</strong> a dot plot for <strong>the</strong> genomes<br />
of stra<strong>in</strong> REY15A and S. solfataricus P2 (Fig. 1B). This extensive<br />
gene synteny is surpris<strong>in</strong>g, given <strong>the</strong> high level of transpositional<br />
activity occurr<strong>in</strong>g <strong>in</strong> S. solfataricus (Table 1) (7, 30, 41).<br />
A similar pattern was also observed when o<strong>the</strong>r pairs of S.<br />
islandicus genomes from different geographical locations were
1674 GUO ET AL. J. BACTERIOL.<br />
TABLE 1. Summary of genetic properties obta<strong>in</strong>ed from genomes of two Icelandic S. islandicus stra<strong>in</strong>s and o<strong>the</strong>r available S. solfataricus and S. islandicus stra<strong>in</strong>s<br />
Genetic properties obta<strong>in</strong>ed from genomes of b :<br />
Characteristic<br />
SsolP2 Ssol98/2 REY15A HVE10/4 LD8.5 LS2.15 M16.4 M16.27 M14.25 YG57.14 YN15.51<br />
Orig<strong>in</strong> Naples, Italy Unknown Reykjanes, Hvergaardi, Lassen, Lassen, Kamchatka, Kamchatka, Kamchatka, Yellowstone, Yellowstone,<br />
Iceland Iceland USA USA Russia Russia Russia USA USA<br />
GenBank accession no. AE006641 CP001402 CP002425 CP002426 CP001731 CP001399 CP001402 CP001401 CP001400 CP001403 CP001404<br />
Genome size (Mb) 3.0 2.7 2.5 2.7 2.7 2.7 2.6 2.7 2.6 2.7 2.8<br />
No. of:<br />
Conserved genes (total, 1,679) 675 656 765 847 837 842 848 823 797 869 840<br />
Unique s<strong>in</strong>gle genes (total, 1,346) 190 138 118 114 209 100 100 75 49 113 140<br />
Transporters (total, 15) 11 11 13 14 11 11 12 11 14 12 10<br />
VapBC antitox<strong>in</strong>-tox<strong>in</strong>s 20 18 16 18 21 24 21 21 21 20 19<br />
Glycosyl transferases (50-kb Absent Absent Absent 15 Absent 15 15 15 15 Absent 15<br />
region)<br />
Conserved noncod<strong>in</strong>g RNAs 123 81 44 42 42 44 39 42 43 42 42<br />
Transposases/IS elements 168 158 75 65 68 60 34 47 45 103 130<br />
MITEs (families) 155 (6) 133 (6) 9 (2) 11 (2) 4 (1) 4 (1) 10 (2) 5 (2) 7 (2) 5 (1) 5 (1)<br />
D, 2 B (B) B 2 B B B 2B, D B B, D B, D 3 B E<br />
I, II (2 I) II (2 I) I I, III I, III (I) I (II) I I, II I, II I I<br />
Cmr family(ies) a<br />
<strong>CRISPR</strong>/Cas family(ies) a<br />
a Letters and numbers <strong>in</strong> paren<strong>the</strong>ses for <strong>the</strong> Cmr and <strong>CRISPR</strong>/Cas modules families (25, 48) denote <strong>the</strong> numbers and families of putatively defective modules generally lack<strong>in</strong>g essential genes.<br />
b Lassen, Lassen Volcanic National Park; Yellowstone, Yellowstone National Park.<br />
FIG. 2. Neighbor-jo<strong>in</strong><strong>in</strong>g tree based on a gene content matrix, <strong>in</strong>clud<strong>in</strong>g<br />
<strong>the</strong> conserved, core, and unique genes for each available S.<br />
islandicus and S. solfataricus genome (Table 1). The branch lengths<br />
represent <strong>the</strong> number of differences between <strong>the</strong> stra<strong>in</strong>s <strong>in</strong> terms of <strong>the</strong><br />
presence or absence of <strong>in</strong>dividual genes. The data for <strong>the</strong> tree were<br />
prepared us<strong>in</strong>g methods described earlier (44, 48). Only bootstrap<br />
values below 100% for <strong>the</strong> <strong>in</strong>dividual branches are given.<br />
compared (48), consistent with a high level of conservation of<br />
gene synteny for all <strong>the</strong> S. solfataricus and S. islandicus genomes.<br />
A phylogenetic tree derived from <strong>the</strong> available genomes<br />
clusters toge<strong>the</strong>r S. islandicus stra<strong>in</strong>s from different geographical<br />
locations (44), with S. solfataricus stra<strong>in</strong>s P2 and 98/2 be<strong>in</strong>g<br />
more distantly related (Fig. 2 and Table 1). The nucleotide<br />
sequence identity for <strong>the</strong> concatenated core genes of <strong>the</strong> two S.<br />
islandicus genomes (Fig. 1A) is 99.6%, and between all <strong>the</strong> S.<br />
islandicus genomes, it is about 99%. The relatively long<br />
branches for <strong>in</strong>dividual stra<strong>in</strong>s (Fig. 2) arise ma<strong>in</strong>ly from differences<br />
<strong>in</strong> gene content of <strong>the</strong> large variable regions (Fig. 1A).<br />
The degree of sequence identity between <strong>the</strong> concatenated<br />
core genes of <strong>the</strong> S. islandicus and S. solfataricus genomes is<br />
about 90% (Fig. 2).<br />
Three orig<strong>in</strong>s of chromosome replication, demonstrated experimentally<br />
for S. solfataricus and S. acidocaldarius (27, 46),<br />
are well conserved with respect to both <strong>the</strong> DNA sequence and<br />
flank<strong>in</strong>g gene organization <strong>in</strong> both of <strong>the</strong> genomes, albeit with<br />
<strong>the</strong> orig<strong>in</strong> oriC2 be<strong>in</strong>g <strong>in</strong>verted relative to <strong>the</strong> genomes of S.<br />
solfataricus P2 and S. islandicus stra<strong>in</strong> YN1551 (Fig. 1B). Orig<strong>in</strong><br />
oriC1 lies immediately upstream of cdc6-1, oriC2 is close to<br />
cdc6-3, while oriC3 is positioned downstream of <strong>the</strong> whiP gene<br />
(Fig. 1A). The two cdc6 genes and <strong>the</strong> whiP gene encode<br />
putative replication <strong>in</strong>itiators (45).<br />
Large variable region. The genomes carry two types of variable<br />
regions. The large region, constitut<strong>in</strong>g 20 to 25% of each
VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1675<br />
tRNA<br />
TABLE 2. Integration events at tRNA genes show<strong>in</strong>g <strong>the</strong> sizes and orig<strong>in</strong>s of <strong>the</strong> residual <strong>in</strong>tegrated genes<br />
Intron<br />
present<br />
genome, extends approximately from positions encompass<strong>in</strong>g<br />
0.3 to 0.8 Mb and 0.3 to 1.0 Mb for stra<strong>in</strong>s REY15A and<br />
HVE10/4, respectively (Fig. 1A). The o<strong>the</strong>r class is represented<br />
ma<strong>in</strong>ly by regions downstream from tRNA genes, where<br />
<strong>in</strong>tegration events have occurred (Table 1; also, see below).<br />
The large variable region conta<strong>in</strong>s about 60% of <strong>the</strong> potentially<br />
transposable IS elements and most of <strong>the</strong> nonautonomous<br />
mobile elements, as well as many degenerate copies of <strong>the</strong><br />
former (Fig. 1A). It carries some gene clusters, which are<br />
present <strong>in</strong> one or more of <strong>the</strong> Sulfolobus genomes, <strong>in</strong>clud<strong>in</strong>g<br />
operons and gene cassettes associated with metabolic pathways,<br />
and it conta<strong>in</strong>s <strong>the</strong> diverse <strong>CRISPR</strong>/Cas and Cmr modules<br />
(Table 1; also, see below). It generally lacks essential<br />
genes; for example, no tRNA genes or replication orig<strong>in</strong>s are<br />
present, and thus, it appears to constitute a region where<br />
nonessential genes are collected, <strong>in</strong>terchanged, and exchanged<br />
<strong>in</strong>tercellularly and where genetic <strong>in</strong>novation occurs.<br />
Integration sites. tRNA gene <strong>in</strong>tegration events <strong>in</strong> Sulfolobus<br />
genomes predom<strong>in</strong>antly <strong>in</strong>volve conjugative plasmids and<br />
fuselloviruses, and <strong>the</strong>se were also <strong>the</strong> genetic elements most<br />
commonly isolated from acidic hot spr<strong>in</strong>gs <strong>in</strong> Iceland (63).<br />
Most <strong>in</strong>tegration events occur via an archaea-specific mechanism,<br />
whereby a viral/plasmid <strong>in</strong>tegrase gene recomb<strong>in</strong>es <strong>in</strong>to a<br />
host tRNA gene and partitions (32). The capture of a genetic<br />
element <strong>in</strong> a chromosome leaves a trace because <strong>the</strong> <strong>in</strong>tN<br />
fragment overlapp<strong>in</strong>g <strong>the</strong> tRNA gene is generally ma<strong>in</strong>ta<strong>in</strong>ed,<br />
even if <strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> genetic element degenerates or is<br />
deleted (51, 52) (Table 2).<br />
For stra<strong>in</strong>s REY15A and HVE10/4, remnants of <strong>in</strong>tegrated<br />
elements adjo<strong>in</strong> eight and five tRNA genes, respectively<br />
(Table 2). Most of <strong>the</strong> <strong>in</strong>tegrated genes derive from conjugative<br />
plasmids, and fuselloviral genes were detected only at<br />
tRNA Thr [GGT] <strong>in</strong> each stra<strong>in</strong>, with an <strong>in</strong>tegrated region of<br />
unknown orig<strong>in</strong> at tRNA Met [CAT] <strong>in</strong> stra<strong>in</strong> REY15A. All of<br />
<strong>the</strong> <strong>in</strong>tegrated elements are highly degenerate, with IS elements<br />
or m<strong>in</strong>iature <strong>in</strong>verted-repeat transposable elements (MITEs) <strong>in</strong>serted<br />
downstream from <strong>the</strong> tRNA genes (Table 2). Given <strong>the</strong><br />
possibility of multiple <strong>in</strong>tegrations of genetic elements occurr<strong>in</strong>g<br />
at a given tRNA gene, it is difficult to analyze unambiguously <strong>the</strong><br />
orig<strong>in</strong>s of residual <strong>in</strong>tegrated genes (42).<br />
In contrast to <strong>the</strong> two Icelandic stra<strong>in</strong>s, <strong>the</strong> o<strong>the</strong>r S. solfataricus<br />
and S. islandicus genomes carry <strong>in</strong>tact genetic elements<br />
bordered by <strong>in</strong>tN and <strong>in</strong>tC fragments that are all potentially<br />
excisable (44, 52). They each show evidence of 2 to 7 tRNA<br />
gene <strong>in</strong>tegration events, <strong>in</strong> which <strong>the</strong> most conserved sites are<br />
tRNA Pro [GGG] and tRNA Ala [GGC], with less common events<br />
Integration event a<br />
REY15A HVE10/4<br />
Conserved?<br />
Val—TAC No SiRe1242-1247 conj plasmid No <strong>in</strong>sert No<br />
Phe—GAA No SiRe1321-1323 conj plasmid SiH1399-1402 conj plasmid Yes<br />
Met—CAT Yes SiRe1465-1479, 12 kb IS, pNOB8 <strong>in</strong>tegrase, unknown No <strong>in</strong>sert No<br />
Glu—TTC No SiRe1484-1490 conj plasmid SiH1561-1574 conj plasmid Yes<br />
Ala—GGC No <strong>in</strong>tN fragment <strong>in</strong>tN fragment Yes<br />
Thr—GGT No SiRe2413-2417 IS/MITEs SSV SiH2464-24672 SSV Partly<br />
Pro—GGG Yes <strong>in</strong>tN fragment <strong>in</strong>tN fragment Yes<br />
His—GTG No SiRe1787-1792, 7 kb IS No <strong>in</strong>sert No<br />
a SSV, sp<strong>in</strong>dle-shaped fusellovirus; conj, conjugative; <strong>in</strong>t, <strong>in</strong>tegrase.<br />
at tRNA Leu [GAG] and different alleles of tRNA Arg (Table 2).<br />
For <strong>the</strong> <strong>in</strong>tegrated tRNA genes of <strong>the</strong> Icelandic stra<strong>in</strong>s, <strong>the</strong>re<br />
was no significant correlation between <strong>the</strong> identity of <strong>the</strong><br />
tRNA anticodon and <strong>the</strong> frequency of codon usage or between<br />
<strong>the</strong> encoded am<strong>in</strong>o acid and <strong>the</strong> average number of am<strong>in</strong>o<br />
acids <strong>in</strong> <strong>the</strong> genome-encoded prote<strong>in</strong>s.<br />
Anti-<strong>in</strong>tegration role for tRNA <strong>in</strong>trons. Each genome carries<br />
45 tRNA genes and 2 to 3 pseudo-tRNA genes all located <strong>in</strong><br />
conserved regions. Sixteen of <strong>the</strong> tRNA genes conta<strong>in</strong> <strong>in</strong>trons<br />
immediately 3 to <strong>the</strong> anticodon, vary<strong>in</strong>g <strong>in</strong> size from 12 to 65<br />
bp, and <strong>in</strong> contrast to many archaeal tRNA genes, none were<br />
detected at o<strong>the</strong>r sites (29), although putatively degenerate<br />
<strong>in</strong>trons, lack<strong>in</strong>g <strong>the</strong> capacity to form splic<strong>in</strong>g sites, occur <strong>in</strong><br />
D-loop regions of tRNA Glu [CTC] and tRNA Glu [TTC]. Moreover,<br />
<strong>the</strong> tRNA genes and <strong>in</strong>trons are highly conserved <strong>in</strong><br />
sequence between <strong>the</strong> two genomes, and also with <strong>the</strong> o<strong>the</strong>r six<br />
S. islandicus genomes, with very few base changes occurr<strong>in</strong>g<br />
between <strong>the</strong> <strong>in</strong>trons of a given tRNA. This high level of tRNA<br />
and <strong>in</strong>tron sequence conservation extends to S. solfataricus P2,<br />
with only very m<strong>in</strong>or differences observed for about one-third<br />
of <strong>the</strong> genes, and it re<strong>in</strong>forces <strong>the</strong> concept that <strong>the</strong> RNA<br />
<strong>in</strong>trons are functionally important (5).<br />
Apossiblefunctionfor<strong>the</strong>tRNA<strong>in</strong>trons,suggestedby<br />
<strong>the</strong> above-described analyses, is that <strong>the</strong>y provide protection<br />
aga<strong>in</strong>st <strong>in</strong>tegration of genetic elements <strong>in</strong>to tRNA genes.<br />
Integration can be disadvantageous <strong>in</strong> that pre-tRNA transcription<br />
can be impaired. Only two <strong>in</strong>tron-carry<strong>in</strong>g tRNA<br />
genes showed evidence of <strong>in</strong>tegration events (Table 2). For<br />
<strong>the</strong> tRNA Met [CAT] gene copies, an <strong>in</strong>tact <strong>in</strong>tegrase gene is<br />
located downstream from <strong>the</strong> tRNA gene, while for <strong>the</strong><br />
tRNA Pro [GGG], an overlapp<strong>in</strong>g <strong>in</strong>tN fragment is present,<br />
but <strong>the</strong> overlapp<strong>in</strong>g sequence does not extend to <strong>the</strong> <strong>in</strong>tron,<br />
suggest<strong>in</strong>g that <strong>the</strong> <strong>in</strong>tron entered after <strong>the</strong> <strong>in</strong>tegration<br />
event. This is consistent with <strong>the</strong> latter <strong>in</strong>tegration event<br />
be<strong>in</strong>g <strong>the</strong> most conserved, and probably <strong>the</strong> most ancient,<br />
among Sulfolobus species.<br />
IS elements and <strong>the</strong> versatile orfB element. Each genome<br />
carries a limited range of IS element types, with some <strong>in</strong> multiple<br />
copies (Table 3). The IS elements are clustered <strong>in</strong> <strong>the</strong><br />
variable genomic region and also downstream from tRNA<br />
genes that have undergone <strong>in</strong>tegration events (Fig. 1A). Many<br />
of <strong>the</strong>se elements appear to be <strong>in</strong>tact, carry<strong>in</strong>g <strong>the</strong> <strong>in</strong>verted<br />
term<strong>in</strong>al repeats (ITRs) required for transposition, but exhibit<br />
fragmented transposase genes, which are unlikely to be restored<br />
by programmed translational frameshift<strong>in</strong>g, as was observed<br />
for some bacterial transposases of <strong>the</strong> IS1 and IS3
1676 GUO ET AL. J. BACTERIOL.<br />
Element Family<br />
TABLE 3. Properties of <strong>the</strong> IS elements, transposases, and MITEs <strong>in</strong> <strong>the</strong> Icelandic genomes a<br />
ORFs<br />
REY15A<br />
copies<br />
families (28). Although some of <strong>the</strong>se elements may be mobilizable<br />
by transposases act<strong>in</strong>g <strong>in</strong> trans, for over one-third of <strong>the</strong><br />
IS families present, <strong>the</strong>re is no encoded transposase (Table 3).<br />
Potentially, <strong>the</strong> most active elements are ISC1200 and ISC1234<br />
<strong>in</strong> both genomes and ISC1229 <strong>in</strong> stra<strong>in</strong> HVE10/4 (Table 3).<br />
The two Icelandic S. islandicus stra<strong>in</strong>s, toge<strong>the</strong>r with those<br />
from Kamchatka, Russia, carry <strong>the</strong> lowest number of IS elements<br />
(Table 1), many of which are <strong>in</strong>active.<br />
orfB elements of family IS605, toge<strong>the</strong>r with elements of <strong>the</strong><br />
IS6 family (Table 3), are considered to represent <strong>the</strong> few<br />
classes of transposable elements that are ancestral to <strong>the</strong> archaeal<br />
doma<strong>in</strong> (16). orfB occurs alone, or toge<strong>the</strong>r with a<br />
transposase gene, orfA, <strong>in</strong> <strong>the</strong> IS200/605 family of transposable<br />
elements. They lack ITRs, and both element types occur<br />
commonly <strong>in</strong> viruses and conjugative plasmids of <strong>the</strong> Sulfolobales<br />
(18, 40) (Table 3). Exceptionally, stra<strong>in</strong> REY15A and<br />
HVE10/4 genomes carry 11 and 16 nearly identical copies of<br />
<strong>the</strong> s<strong>in</strong>gle orfB elements <strong>in</strong> unconserved genomic positions,<br />
respectively. This is consistent with <strong>the</strong>se be<strong>in</strong>g <strong>the</strong> most active<br />
transposable elements <strong>in</strong> each genome (Table 3), although it<br />
rema<strong>in</strong>s uncerta<strong>in</strong> whe<strong>the</strong>r <strong>the</strong>y are autonomous or require an<br />
OrfA <strong>in</strong> trans for mobility (16). In addition, <strong>the</strong> orfB elements<br />
are exceptionally adaptable, because a fur<strong>the</strong>r 8 and 2 copies<br />
are physically coupled to copies of ISC1200 for stra<strong>in</strong>s<br />
REY15A and HVE10/4, respectively (Table 3), and are potentially<br />
cotransposable.<br />
Sulfolobus MITEs. Only two MITE types were detected <strong>in</strong><br />
multiple copies <strong>in</strong> each genome, SMN1 (320 bp) and SM3A<br />
(164 bp) (Table 3), and both of which are capable of nonautonomous<br />
transposition <strong>in</strong> different S. islandicus stra<strong>in</strong>s, facilitated<br />
by transposases of ISC1733 and ISC1058, respectively<br />
(2, 4, 43). All SMN1 copies are located immediately downstream<br />
from <strong>the</strong> sequence TTTAA, but none occur at conserved<br />
positions with<strong>in</strong> <strong>the</strong> two genomes. Clearly, <strong>the</strong> SMN1<br />
Intact TPases<br />
<strong>in</strong> REY15A<br />
No. of:<br />
HVE10/<br />
4 copies<br />
Intact TPases<br />
<strong>in</strong> HVE10/4<br />
ISC796 IS1 1 5 0 4 1 1<br />
ISC1043 ISL3 1 1 0 1 0 1<br />
ISC1048 IS630 1 10 0 12 0 3<br />
ISC1058 IS5 1 2 0 1 0 1<br />
ISC1078 IS630 1 1 0 1 0 0<br />
ISC1190 IS110 1 3 1 1 0 0<br />
ISC1200 ISH3 1 22 11 7 3 0<br />
ISC1205 ISCNY 2 3 0 4 2 1<br />
ISC1229 IS110 1 4 2 10 9 1<br />
ISC1234 IS5 1 8 6 7 5 1<br />
ISC1332 IS256 2 1 1 1 1 0<br />
ISC1395 IS630 1/2 2 1 0 0 0<br />
ISC1733 IS200/IS605 2 8 8 2 2 1<br />
ISC1921 IS607 2 1 1 0 0 0<br />
ISSis1 (pARN4) IS6 1 4 2 5 4 0<br />
ISSto2 IS6 1 2 1 2 2 0<br />
OrfB IS605 1 19 (8) 18 18 (2) 17 0<br />
SM3A 0 2 0 2 0 2<br />
SMN1 0 7 7 9 9 0<br />
Conserved<br />
genome<br />
positions<br />
a The nomenclature used for IS elements and MITEs follows that which was used previously (2, 6, 16). For <strong>the</strong> OrfB elements, <strong>the</strong> numbers <strong>in</strong> paren<strong>the</strong>ses <strong>in</strong>dicate<br />
<strong>the</strong> numbers of copies that are physically l<strong>in</strong>ked to ISC1200 elements. TPase, transposase.<br />
MITEs are active <strong>in</strong> both of <strong>the</strong> genomes, as is ISC1733, which<br />
encodes <strong>the</strong> mobiliz<strong>in</strong>g transposase (Table 3), and <strong>the</strong>y appear<br />
to be cleanly excised when mobilized, <strong>in</strong> agreement with <strong>the</strong><br />
results of an earlier <strong>in</strong>duced excision <strong>in</strong> <strong>the</strong> S. islandicus stra<strong>in</strong><br />
REN1H1 (2). Although most SMN1 copies lie <strong>in</strong> <strong>in</strong>tergenic<br />
regions, and may or may not affect regulatory signals, some<br />
appear to <strong>in</strong>activate or alter genes. Thus, <strong>in</strong> stra<strong>in</strong> REY15A,<br />
an AAA ATPase (SiRe0883) and a hypo<strong>the</strong>tical gene<br />
(SiRe0925) have <strong>in</strong>curred <strong>in</strong>sertions <strong>in</strong> <strong>the</strong>ir promoters, and <strong>in</strong><br />
stra<strong>in</strong> HVE10/4, SMN1 copies partially overlap with two genes<br />
(SiH0773/2472), generat<strong>in</strong>g altered ORF sequences.<br />
In contrast, <strong>the</strong> two SM3A copies are conserved <strong>in</strong> position<br />
<strong>in</strong> each genome, consistent with <strong>the</strong> mobiliz<strong>in</strong>g transposase<br />
encoded <strong>in</strong> ISC1058 be<strong>in</strong>g degenerate <strong>in</strong> both genomes. Never<strong>the</strong>less,<br />
each SM3A copy reta<strong>in</strong>s <strong>the</strong> conserved 8-bp <strong>in</strong>verted<br />
term<strong>in</strong>al repeat of <strong>the</strong> ISC1058 element (and unconserved<br />
9-bp direct repeats result<strong>in</strong>g from <strong>the</strong> transposition event) and<br />
can potentially be mobilized if a transposase-encod<strong>in</strong>g<br />
ISC1058 element enters <strong>the</strong> cell. Their ma<strong>in</strong>tenance as <strong>in</strong>tact<br />
elements may result from one SM3A copy overlapp<strong>in</strong>g with <strong>the</strong><br />
start of a conserved C/D box RNA gene (3), which may alter its<br />
transcriptional properties, while <strong>the</strong> o<strong>the</strong>r lies between promoters<br />
of two conserved prote<strong>in</strong> genes and may <strong>in</strong>fluence <strong>the</strong>ir<br />
relative transcriptional levels. SM3A occurs <strong>in</strong> a few copies <strong>in</strong><br />
each of <strong>the</strong> sequenced S. islandicus genomes, whereas SMN1 is<br />
limited to <strong>the</strong> Icelandic and three Kamchatka stra<strong>in</strong>s, where it<br />
occurs <strong>in</strong> 1 to 5 copies (Table 1).<br />
Stra<strong>in</strong>-specific metabolic pathways. Each Icelandic stra<strong>in</strong><br />
shows a few specific metabolic properties. Thus, <strong>the</strong> REY15/A<br />
stra<strong>in</strong> carries an operon (SiRe0441-0445) encod<strong>in</strong>g enzymes<br />
implicated <strong>in</strong> nitrate reduction and nitrite extrusion, suggest<strong>in</strong>g<br />
that it can use nitrate as a term<strong>in</strong>al electron acceptor for<br />
anaerobic respiration. The operon is located <strong>in</strong> <strong>the</strong> variable<br />
region and has been observed previously only for two o<strong>the</strong>r
VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1677<br />
archaea, S. islandicus stra<strong>in</strong>s M.14.25 and M.16.27. The larger<br />
genome of stra<strong>in</strong> HVE10/4 exclusively carries a urease operon<br />
(SiH0978-0983) predicted to encode enzymes <strong>in</strong>volved <strong>in</strong> <strong>the</strong><br />
hydrolysis of urea to NH 4 and CO 2 and previously found only<br />
<strong>in</strong> <strong>the</strong> archaea Sulfolobus tokodaii, Metallosphaera sedula, and<br />
Cenarchaeum symbiosum. Moreover, uniquely for a Sulfolobus<br />
species, stra<strong>in</strong> HVE10/4 also carries several genes predicted to<br />
encode hydrogenases and hydrogenase maturation enzymes<br />
(SiH0883-0892) <strong>in</strong> <strong>the</strong> variable region, which suggests that <strong>the</strong><br />
stra<strong>in</strong> may be able to grow anaerobically.<br />
A 50-kb region of stra<strong>in</strong> HVE10/4 <strong>in</strong> <strong>the</strong> variable region<br />
(SiH0447-0489) is bordered by IS elements and carries 15<br />
predicted glycosyl transferase genes (group 1 and family 2),<br />
constitut<strong>in</strong>g about half of <strong>the</strong> genome copies, <strong>in</strong>terspersed almost<br />
exclusively with genes of unknown function and a gene<br />
encod<strong>in</strong>g a predicted polysaccharide biosyn<strong>the</strong>sis enzyme. It is<br />
well established that Sulfolobus S-layer prote<strong>in</strong>s SlaA and SlaB<br />
(SiRe1612/1 and SiH1691/0, respectively) are heavily glycosylated<br />
(36), but <strong>the</strong> relatively low GC content of <strong>the</strong> region<br />
suggests that it has been <strong>in</strong>serted and has an alternative unknown<br />
function. The genome region is absent from stra<strong>in</strong><br />
REY15A and from some of <strong>the</strong> o<strong>the</strong>r S. islandicus stra<strong>in</strong>s<br />
(Table 1).<br />
Transporters. Sulfolobus stra<strong>in</strong>s utilize different sugars and<br />
carbohydrates as carbon and energy sources (19), consistent<br />
with <strong>the</strong>ir cod<strong>in</strong>g capacity for solute ABC transporters. A total<br />
of 15 different ABC transporters were identified, of which<br />
stra<strong>in</strong> REY15A carries 12 and stra<strong>in</strong> HVE10/4 conta<strong>in</strong>s 14. Of<br />
<strong>the</strong>se, 11 ABC transporters are present <strong>in</strong> S. solfataricus P2<br />
(53), 6 <strong>in</strong> S. tokodaii (23), but only 3 <strong>in</strong> S. acidocaldarius (9).<br />
The o<strong>the</strong>r S. islandicus genomes each carry 10 to 14 ABC<br />
transporters (44) (Table 1). In both of <strong>the</strong> Icelandic genomes,<br />
many ABC transporter genes are located <strong>in</strong> <strong>the</strong> variable region<br />
(Fig. 1A) and are often flanked by transposons, consistent with<br />
<strong>the</strong>ir be<strong>in</strong>g subjected to loss or ga<strong>in</strong> events.<br />
The ABC transporters are diverse, and some of <strong>the</strong>ir solute<br />
specificities have been identified for o<strong>the</strong>r Sulfolobus stra<strong>in</strong>s<br />
(15, 24). Cellobiose, maltose, and arab<strong>in</strong>ose transporters are<br />
present <strong>in</strong> both of <strong>the</strong> Icelandic genomes and most o<strong>the</strong>r sequenced<br />
S. solfataricus and S. islandicus genomes, although a<br />
few S. islandicus stra<strong>in</strong>s lack one of <strong>the</strong> <strong>system</strong>s, as follows: <strong>the</strong><br />
arab<strong>in</strong>ose <strong>system</strong> is absent from stra<strong>in</strong> YG5714, while <strong>the</strong> maltose<br />
<strong>system</strong> is not present <strong>in</strong> stra<strong>in</strong>s YN1551 and LD215. Strik<strong>in</strong>gly,<br />
<strong>the</strong> transporter of glucose, <strong>the</strong> preferred carbon source<br />
for many microbes, is present only <strong>in</strong> <strong>the</strong> Icelandic stra<strong>in</strong>s, S.<br />
islandicus stra<strong>in</strong>s M1415 and YG5714, and <strong>in</strong> S. solfataricus P2.<br />
The lack of specific ABC transporters suggests ei<strong>the</strong>r that<br />
glucose is an uncommon nutrient <strong>in</strong> hot environments or that<br />
ano<strong>the</strong>r ABC transporter can facilitate glucose transport. One<br />
ABC transporter encoded <strong>in</strong> <strong>the</strong> variable region of stra<strong>in</strong><br />
HVE10/4 (SiH0899-0903), flanked by IS elements, appears to<br />
be unique <strong>in</strong> public sequence databases.<br />
Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s. Four of <strong>the</strong> eight families of antitox<strong>in</strong>-tox<strong>in</strong><br />
complexes characterized for free-liv<strong>in</strong>g bacteria also<br />
occur <strong>in</strong> archaea, of which <strong>the</strong> VapBC family is by far <strong>the</strong> most<br />
abundant (34) and is <strong>the</strong> ma<strong>in</strong> antitox<strong>in</strong>-tox<strong>in</strong> family that we<br />
detected <strong>in</strong> <strong>the</strong> Sulfolobus stra<strong>in</strong>s. The Icelandic stra<strong>in</strong>s REY15A<br />
and HVE10/4 carry 17 and 18 vapBC gene pairs, respectively<br />
(Table 1), as well as 2 vapC-like gene copies coupled to o<strong>the</strong>r<br />
genes. They are distributed throughout <strong>the</strong> genomes, with several<br />
TABLE 4. Summary of Sulfolobus conserved noncod<strong>in</strong>g RNA genes<br />
located <strong>in</strong> <strong>the</strong> two Icelandic genomes<br />
RNA Function/modification<br />
No. of <strong>in</strong>dicated RNAs <strong>in</strong><br />
genome of stra<strong>in</strong>:<br />
REY15A HVE10/4<br />
C/D box rRNA 18 16<br />
C/D box tRNA 4 4<br />
C/D box Unknown 3 3<br />
H/ACA box tRNA 2 2<br />
Noncod<strong>in</strong>g Unknown 31 27<br />
Total 58 53<br />
located <strong>in</strong> <strong>the</strong> variable region, and only five gene pairs are conserved<br />
<strong>in</strong> sequence and gene contexts <strong>in</strong> both stra<strong>in</strong>s (SiRe0698/<br />
SiH0636, SiRe2073/SiH2137, SiRe2171/SiH2227, SiRe2294/<br />
SiH2344, and SiRe2626/SiH2689). Sequence alignments and treebuild<strong>in</strong>g<br />
exercises demonstrated that <strong>the</strong> sequences of both<br />
antitox<strong>in</strong>s and tox<strong>in</strong>s with<strong>in</strong> each genome are very diverse and can<br />
be classified <strong>in</strong>to subtypes (data not shown), consistent with <strong>the</strong>ir<br />
functional diversity and target<strong>in</strong>g of different cellular sites. These<br />
data also <strong>in</strong>dicate, for given gene pairs, that <strong>the</strong> subtypes of VapB<br />
and VapC do not always correspond, imply<strong>in</strong>g that some gene<br />
pairs may have exchanged partners.<br />
Read<strong>in</strong>g frame shifts and mRNA <strong>in</strong>tron splic<strong>in</strong>g. Examples<br />
of translational read<strong>in</strong>g frame shifts yield<strong>in</strong>g s<strong>in</strong>gle polypeptides<br />
have been demonstrated experimentally for S. solfataricus<br />
P2 (10). For two of <strong>the</strong>se, a predicted transketolase<br />
(SiRe1696/8 and SiH1776/8) and a putative O-sialoglycoprote<strong>in</strong><br />
endopeptidase (SiRe1569/70 and SiH1648/9), <strong>the</strong> S. islandicus<br />
genes overlap <strong>in</strong> a similar way and are likely to undergo<br />
read<strong>in</strong>g frame shifts. In contrast to S. solfataricus P2, -fucosidase<br />
(SiRe2185 and SiH2241) is a s<strong>in</strong>gle gene, as is <strong>the</strong><br />
predicted dihydrolipoamide acyltransferase gene (SiH0582),<br />
located only <strong>in</strong> stra<strong>in</strong> HVE10/4. Very few transposase genes<br />
present <strong>in</strong> IS elements (Table 3) carry a s<strong>in</strong>gle read<strong>in</strong>g frame<br />
shift that could be expressed as a s<strong>in</strong>gle prote<strong>in</strong> via translational<br />
read<strong>in</strong>g frame shifts (28).<br />
Transcripts of <strong>the</strong> <strong>in</strong>tron-carry<strong>in</strong>g cbf5 genes (SiRe1607/8<br />
and SiH1686/7) have been demonstrated to be spliced by <strong>the</strong><br />
archaeal splic<strong>in</strong>g enzyme at <strong>the</strong> mRNA level <strong>in</strong> some crenarchaea<br />
(60). O<strong>the</strong>r mRNAs, <strong>in</strong>clud<strong>in</strong>g those encod<strong>in</strong>g <strong>the</strong> XPD<br />
helicase (SiRe1685/SiH1765), have been predicted to undergo<br />
splic<strong>in</strong>g, but experimental support is lack<strong>in</strong>g (5).<br />
Noncod<strong>in</strong>g RNAs. Many untranslated RNAs have been<br />
characterized for S. solfataricus and S. acidocaldarius us<strong>in</strong>g a<br />
variety of techniques, <strong>in</strong>clud<strong>in</strong>g prob<strong>in</strong>g cell extracts for RNA<br />
with K-turn b<strong>in</strong>d<strong>in</strong>g motifs and generat<strong>in</strong>g cDNA libraries of<br />
total cellular RNA extracts, as well as numerous antisense<br />
RNAs (33, 55, 59, 61). Most of <strong>the</strong>se RNAs were characterized<br />
for nucleotide length and partial sequence, and several were<br />
detected by more than one experimental approach. We have<br />
reanalyzed all <strong>the</strong>se different RNA entities and have annotated<br />
<strong>the</strong> S. islandicus RNA homologs which are conserved <strong>in</strong> both<br />
sequence and gene contexts. The total number of RNA genes<br />
and <strong>the</strong>ir putative functions are given (Table 4).<br />
As for o<strong>the</strong>r archaeal hyper<strong>the</strong>rmophiles, each genome carries<br />
many C/D box RNAs that methylate primarily rRNAs and<br />
tRNAs (Table 4). In stra<strong>in</strong>s REY15A and HVE10/4, 18 and 16
1678 GUO ET AL. J. BACTERIOL.<br />
FIG. 3. (A) Phylogenetic tree of Cmr2 and its homologues <strong>in</strong> all sequenced archaeal genomes generates 5 families, A to E. The two Icelandic stra<strong>in</strong>s<br />
carry family B Cmr modules, for which <strong>the</strong> gene order is shown. O<strong>the</strong>r sequenced S. islandicus and S. solfataricus stra<strong>in</strong>s also carry Cmr modules of family<br />
Band,lessfrequently,familiesDorE,as<strong>in</strong>dicatedon<strong>the</strong>tree.(B)Schematicrepresentationsof<strong>the</strong><strong>CRISPR</strong>/Cascassettes<strong>in</strong><strong>the</strong>twoIcelandicstra<strong>in</strong>s,<br />
toge<strong>the</strong>r with <strong>the</strong> contents of <strong>the</strong>ir <strong>CRISPR</strong> loci. Stra<strong>in</strong> REY15A carries a s<strong>in</strong>gle family I <strong>CRISPR</strong>/Cas cassette (blue), whereas HVE10/4 carries cassettes<br />
from families III and I (orange and blue, respectively). Compositions of <strong>the</strong> <strong>in</strong>dividual <strong>CRISPR</strong> loci are shown, where each triangle represents a<br />
spacer-repeat unit. Significant spacer matches to sequenced viruses and plasmids are color coded (red, rudiviruses; orange, lipothrixviruses; yellow,<br />
fuselloviruses; green, bicaudaviruses; turquoise, turreted icosahedral viruses; blue, conjugative plasmids; and violet, cryptic plasmids).<br />
C/D box RNAs target rRNAs, respectively, while 4 modify<br />
tRNAs and a fur<strong>the</strong>r 3 have unknown targets. Two copies of<br />
H/ACA RNA genes are present <strong>in</strong> each genome which, toge<strong>the</strong>r<br />
with <strong>the</strong> aPus7 prote<strong>in</strong> (SiRe1836 and SiH1908), generate pseudourid<strong>in</strong>e-35<br />
<strong>in</strong> pre-tRNA Tyr transcripts (31). Each of <strong>the</strong>se C/D<br />
box and H/ACA box RNA genes can be detected <strong>in</strong> <strong>the</strong> o<strong>the</strong>r<br />
available S. islandicus genomes, which underl<strong>in</strong>es <strong>the</strong>ir functional<br />
importance. Of <strong>the</strong>se, only three RNA genes characterized for<br />
o<strong>the</strong>r Sulfolobus stra<strong>in</strong>s, Sso-sR4, Sso-sR8, and Sso-92, were not<br />
located <strong>in</strong> any S. islandicus genomes (33, 55). For <strong>the</strong> numerous<br />
noncod<strong>in</strong>g RNAs of unknown function, similar contents were<br />
found for <strong>the</strong> two Icelandic stra<strong>in</strong>s (Table 4) and for <strong>the</strong> o<strong>the</strong>r S.<br />
islandicus stra<strong>in</strong>s, with only a few variations (Table 1), <strong>the</strong>reby<br />
underl<strong>in</strong><strong>in</strong>g <strong>the</strong>ir functional importance.<br />
Diversity of <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s. The<br />
<strong>CRISPR</strong>/Cas and Cmr modules all lie with<strong>in</strong> <strong>the</strong> large variable<br />
regions. They show marked heterogeneity <strong>in</strong> <strong>the</strong> number and<br />
family (25, 48) and are unconserved <strong>in</strong> position between <strong>the</strong><br />
genomes (Fig. 1A). Whereas REY15A carries one paired<br />
<strong>CRISPR</strong>/Cas module of <strong>the</strong> family I type and two family B<br />
Cmr modules, HVE10/4 conta<strong>in</strong>s two paired <strong>CRISPR</strong>/Cas<br />
modules of family I and III types and a s<strong>in</strong>gle family B Cmr<br />
module (48) (Fig. 3A and B). This diversity of <strong>CRISPR</strong>-based<br />
<strong>system</strong>s also extends to <strong>the</strong> o<strong>the</strong>r S. solfataricus and S. islandicus<br />
genomes (Table 1). Although <strong>the</strong> gene content and organization<br />
of <strong>the</strong> paired family I <strong>CRISPR</strong>/Cas modules are quite<br />
conserved among crenarchaea (48), exceptionally, for stra<strong>in</strong><br />
HVE10/4, <strong>the</strong> <strong>in</strong>ternal group of cas genes located between <strong>the</strong><br />
two leader regions is <strong>in</strong>verted (Fig. 3B), <strong>in</strong>dicative of a rearrangement<br />
hav<strong>in</strong>g occurred with<strong>in</strong> <strong>the</strong> module, possibly via <strong>the</strong><br />
identical <strong>in</strong>verted repeat sequences of <strong>the</strong> border<strong>in</strong>g leader<br />
regions (Fig. 3B).<br />
The <strong>CRISPR</strong> loci of stra<strong>in</strong> REY15A carry 115 and 93 spacerrepeat<br />
units centered at position 733,000, while those of<br />
HVE10/4 conta<strong>in</strong> 116 and 101 repeat-spacer units and 35 and<br />
14 repeat-spacer units centered at positions 364000 and<br />
745000, respectively (Fig. 1A). No spacer sequence identity<br />
was detected with<strong>in</strong>, or between, <strong>the</strong> two Icelandic stra<strong>in</strong>s or<br />
with <strong>the</strong> o<strong>the</strong>r S. solfataricus and S. islandicus genomes. None<br />
of <strong>the</strong> available fully sequenced S. islandicus genomes (Table<br />
1) have any spacers <strong>in</strong> common, <strong>in</strong> contrast to <strong>the</strong> S. solfataricus<br />
stra<strong>in</strong>s P1, P2, and 98/2, which all share many identical<br />
spacers (17, 25) despite <strong>the</strong>ir be<strong>in</strong>g as distant from one ano<strong>the</strong>r,<br />
phylogenetically, as <strong>the</strong> S. islandicus stra<strong>in</strong>s (Fig. 2).<br />
Thus, it seems that diversification of genomic <strong>CRISPR</strong> loci can<br />
occur ei<strong>the</strong>r by simple spacer turnover or by horizontal transfer<br />
of whole or partial <strong>CRISPR</strong>/cas cassettes. There is <strong>in</strong>creas<strong>in</strong>g<br />
evidence for <strong>the</strong> latter mechanism be<strong>in</strong>g <strong>the</strong> most common one<br />
<strong>in</strong> S. islandicus stra<strong>in</strong>s (17, 21).<br />
S<strong>in</strong>ce many of <strong>the</strong> characterized viruses and plasmids of Sulfolobus<br />
derive from Iceland, we analyzed <strong>the</strong> degree to which<br />
<strong>CRISPR</strong> spacer sequences of <strong>the</strong> Icelandic stra<strong>in</strong>s yielded significant<br />
matches to genetic element sequences us<strong>in</strong>g an earlier approach<br />
exam<strong>in</strong><strong>in</strong>g nucleotide and translated sequences of <strong>the</strong><br />
spacers (25, 49). Several significant sequence matches were detected<br />
for both of <strong>the</strong> genomes, primarily to rudiviruses, fuselloviruses,<br />
and conjugative plasmids, all of which are abundant <strong>in</strong><br />
Icelandic hot spr<strong>in</strong>gs (63), but also were detected <strong>in</strong> smaller numbers<br />
to o<strong>the</strong>r viruses and cryptic plasmids (Fig. 3B).<br />
DISCUSSION<br />
The genome analyses underl<strong>in</strong>e <strong>the</strong> potential importance of<br />
S. islandicus stra<strong>in</strong> REY15A as a model organism for molec-
VOL. 193, 2011 GENOME ANALYSES OF ICELANDIC STRAINS OF S. ISLANDICUS 1679<br />
ular genetic studies of <strong>the</strong> Sulfolobales, and crenarchaea <strong>in</strong><br />
general, for a variety of reasons. The genome size of 2.5 Mb is<br />
m<strong>in</strong>imal for a Sulfolobus species; moreover, <strong>the</strong> <strong>in</strong>cidence of<br />
mobile elements is relatively low (Table 1), and stable deletion<br />
mutants can be readily isolated (14, 20). Fur<strong>the</strong>rmore, <strong>the</strong> high<br />
<strong>in</strong>cidence of diverse ABC transporter <strong>system</strong>s (Table 1) may<br />
expla<strong>in</strong> why S. islandicus (and S. solfataricus) is most commonly<br />
isolated from enrichment cultures obta<strong>in</strong>ed from terrestrial<br />
acidic hot spr<strong>in</strong>gs, which is <strong>in</strong> contrast to, for example, S.<br />
acidocaldarius, which carries only three ABC transporters (9,<br />
44, 63).<br />
The relatively high <strong>in</strong>cidence of deletion mutants obta<strong>in</strong>ed<br />
from stra<strong>in</strong> REY15A occurs despite <strong>the</strong> presence of several<br />
transposable elements. However, <strong>in</strong> both of <strong>the</strong> Icelandic<br />
stra<strong>in</strong>s, many of <strong>the</strong> IS elements are degenerate or carry disrupted<br />
transposase genes (Table 3), consistent with <strong>the</strong> “copyand-paste”<br />
transpositional mechanism of most classes of Sulfolobus<br />
IS elements and <strong>the</strong>ir undetectably low reversibility<br />
rate (4, 41). The <strong>in</strong>ability to remove <strong>the</strong> elements by spontaneous<br />
deletion, which does occur <strong>in</strong> many bacteria (16), may<br />
also expla<strong>in</strong> <strong>the</strong> presence of antisense RNAs <strong>in</strong> Sulfolobus<br />
species to regulate transposase activity (55). The Icelandic<br />
stra<strong>in</strong>s do, however, carry many copies of orphan orfB elements<br />
and SMN1 MITEs, which are mobilized by a “cutand-paste”<br />
mechanism presumably through OrfA encoded<br />
<strong>in</strong> IS element ISC1733 (2, 16). The SMN1 MITEs appear to<br />
be specific to <strong>the</strong> Icelandic and Kamchatka stra<strong>in</strong>s (Table 1),<br />
and <strong>the</strong>y can generate genetic novelty, reversibly, by extend<strong>in</strong>g<br />
open read<strong>in</strong>g frames, <strong>in</strong> contrast to <strong>the</strong> o<strong>the</strong>r Sulfolobus<br />
MITEs, which carry many potential stop codons <strong>in</strong> all read<strong>in</strong>g<br />
frames (43). The absence of most of <strong>the</strong> known Sulfolobus<br />
MITEs, except SM3A, probably reflects <strong>the</strong> much lower<br />
diversity of <strong>the</strong> mobiliz<strong>in</strong>g transposases present (Table 3).<br />
Many of <strong>the</strong>se elements are located <strong>in</strong> <strong>the</strong> large variable<br />
region where genetic diversification occurs, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong><br />
uptake and loss of operons and gene cassettes and rearrangements<br />
of ma<strong>in</strong>ly nonessential genes. A similar variable<br />
genetic region <strong>in</strong> many genetic elements of Sulfolobus has<br />
also been observed (e.g., see reference 18).<br />
Many questions concern<strong>in</strong>g <strong>the</strong> exceptional molecular and<br />
cellular properties of crenarchaeal organisms rema<strong>in</strong> to be<br />
resolved. They <strong>in</strong>clude <strong>the</strong> functions of <strong>the</strong> multiple and highly<br />
diverse gene pairs encod<strong>in</strong>g VapBC antitox<strong>in</strong>-tox<strong>in</strong>s. For hyper<strong>the</strong>rmophilic<br />
Sulfolobus species, <strong>in</strong> particular, <strong>the</strong>ir presence<br />
and variety could be a prerequisite for adaptation to life<br />
under extreme, and sometimes rapidly vary<strong>in</strong>g, temperature<br />
and pH conditions, as well as to survival <strong>in</strong> nutrient-poor environments<br />
possibly by optimiz<strong>in</strong>g <strong>the</strong> quality control of gene<br />
expression (12, 34). They may also be related to <strong>the</strong> sulfolobic<strong>in</strong>s<br />
implicated <strong>in</strong> kill<strong>in</strong>g competitor Sulfolobus cells (39).<br />
The crystal structure of a VapC tox<strong>in</strong> from <strong>the</strong> crenarchaeal<br />
hyper<strong>the</strong>rmophile Pyrobaculum aerophilum implicated <strong>the</strong> prote<strong>in</strong><br />
<strong>in</strong> exonuclease activity (1), but <strong>the</strong> multiplicity and wide<br />
sequence diversity of <strong>the</strong> vapBC genes suggest that <strong>the</strong> tox<strong>in</strong>s<br />
target different cellular or molecular sites.<br />
Stra<strong>in</strong> HVE10/4 has been used as a host for a variety of<br />
genetic elements, ma<strong>in</strong>ly from Iceland, which were likely to be<br />
genetically close to <strong>the</strong> Icelandic host (63). The genome analyses<br />
provide few <strong>in</strong>sights <strong>in</strong>to why it is a good host, especially<br />
s<strong>in</strong>ce it appears to carry a type 1 restriction-modification sys-<br />
tem (SiH1435 to SiH1437). Moreover, <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />
<strong>CRISPR</strong>/Cmr modules of stra<strong>in</strong> HVE10/4 are relatively complex,<br />
as <strong>the</strong>y also are for stra<strong>in</strong> REY15A and o<strong>the</strong>r Sulfolobus<br />
stra<strong>in</strong>s. Their activities have also been demonstrated, at least<br />
for stra<strong>in</strong> REY15A, by challeng<strong>in</strong>g <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s<br />
with vector-borne match<strong>in</strong>g protospacers ma<strong>in</strong>ta<strong>in</strong>ed under<br />
selection, which produced deletions of <strong>the</strong> match<strong>in</strong>g spacers<br />
(20). The puzzle rema<strong>in</strong>s as to why <strong>the</strong> Sulfolobus <strong>CRISPR</strong>based<br />
<strong>system</strong>s are so complex, given that many of <strong>the</strong> viruses<br />
and plasmids coexist at low copy numbers and are nonlytic.<br />
One possibility is that <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> primarily has a<br />
regulatory role, with antisense crRNAs (<strong>CRISPR</strong> RNAs) target<strong>in</strong>g<br />
viral mRNAs. Whatever <strong>the</strong> reason, <strong>the</strong> genetic closeness<br />
of stra<strong>in</strong>s REY15A and HVE10/4 suggests that <strong>the</strong> former<br />
may also be a broad host for viruses and plasmids, with <strong>the</strong><br />
added advantage that genetic manipulation <strong>system</strong>s are now<br />
available, and our prelim<strong>in</strong>ary studies with fuselloviruses and<br />
conjugative plasmids support this supposition.<br />
ACKNOWLEDGMENTS<br />
This research was supported by grants from <strong>the</strong> National Natural<br />
Science Foundation of Ch<strong>in</strong>a (grants 306210165, 30730003, and<br />
30870058) to L.H., a grant from <strong>the</strong> Danish Research Council for<br />
Technology and Production (grant 09-062932) to Q.S., and grants from<br />
<strong>the</strong> Danish Natural Science Research Council (grant 272-08-0391) and<br />
Danish National Research Foundation to R.A.G.<br />
REFERENCES<br />
1. Arcus, V. L., K. Bäckbro, A. Roost, E. L. Daniel, and E. N. Baker. 2004.<br />
Distant structural homology leads to <strong>the</strong> functional characterisation of an<br />
archaeal PIN doma<strong>in</strong> as an exonuclease. J. Biol. Chem. 279:16471–16478.<br />
2. Berkner, S., and G. Lipps. 2007. An active nonautonomous mobile element<br />
<strong>in</strong> Sulfolobus islandicus REN1H1. J. Bacteriol. 189:2145–2149.<br />
3. Bize, A., et al. 2009. A unique virus release mechanism <strong>in</strong> archaea. Proc. Natl.<br />
Acad. Sci. U. S. A. 106:11306–11311.<br />
4. Blount, Z. D., and D. W. Grogan. 2005. New <strong>in</strong>sertion sequences of Sulfolobus:<br />
functional properties and implications for genome evolution <strong>in</strong> hyper<strong>the</strong>rmophilic<br />
archaea. Mol. Microbiol. 55:312–325.<br />
5. Brügger, K., X. Peng, and R. A. Garrett. 2007. Sulfolobus genomes: mechanisms<br />
of rearrangement and charge, p. 95–104. In R. A. Garrett and H.-P.<br />
Klenk (ed.), <strong>Archaea</strong>: evolution, physiology, and molecular biology. Blackwell<br />
Publish<strong>in</strong>g, Oxford, United K<strong>in</strong>gdom.<br />
6. Brügger, K., et al. 2002. Mobile elements <strong>in</strong> archaeal genomes. FEMS<br />
Microbiol. Lett. 206:131–141.<br />
7. Brügger, K., E. Torar<strong>in</strong>sson, P. Redder, L. Chen, and R. A. Garrett. 2004.<br />
Shuffl<strong>in</strong>g of Sulfolobus genomes by autonomous and non-autonomous mobile<br />
elements. Biochem. Soc. Trans. 32:179–183.<br />
8. Brumfield, S. K., et al. 2009. Particle assembly and ultrastructural features<br />
associated with <strong>the</strong> replication of <strong>the</strong> lytic archaeal virus Sulfolobus turreted<br />
icosahedral virus. J. Virol. 83:5964–5970.<br />
9. Chen, L., et al. 2005. The genome of Sulfolobus acidocaldarius, a model<br />
organism of <strong>the</strong> Crenarchaeota. J. Bacteriol. 187:4992–4999.<br />
10. Cobucci-Ponzano, B., et al. 2010. Functional characterisation and highthroughput<br />
proteomic analysis of <strong>in</strong>terrupted genes <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />
solfataricus. J. Proteome Res. 9:2496–2507.<br />
11. Contursi, P., et al. 2006. Characterisation of <strong>the</strong> Sulfolobus host-SSV2 virus<br />
<strong>in</strong>teraction. Extremophiles 10:615–627.<br />
12. Cooper, C. R., A. J. Daugherty, S. Tachdjian, P. H. Blum, and R. M. Kelly.<br />
2009. Role of vapBC tox<strong>in</strong>-antitox<strong>in</strong> loci <strong>in</strong> <strong>the</strong> <strong>the</strong>rmal stress response of<br />
Sulfolobus solfataricus. Biochem. Soc. Trans. 37:123–126.<br />
13. Delcher, A. L., K. A. Bratke, E. C. Powers, and S. L. Salzberg. 2007. Identify<strong>in</strong>g<br />
bacterial genes and endosymbiont DNA with Glimmer. Bio<strong>in</strong>formatics<br />
23:673–679.<br />
14. Deng, L., H. Zhu, Z. Chen, Y. X. Liang, and Q. She. 2009. Unmarked gene<br />
deletion and host-vector <strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon<br />
Sulfolobus islandicus. Extremophiles 13:735–746.<br />
15. Elfer<strong>in</strong>k, M. G., S. V. Albers, W. N. Kon<strong>in</strong>gs, and A. J. Driessen. 2001. Sugar<br />
transport <strong>in</strong> Sulfolobus solfataricus is mediated by two families of b<strong>in</strong>d<strong>in</strong>g<br />
prote<strong>in</strong>-dependent ABC transporters. Mol. Microbiol. 39:1494–1503.<br />
16. Filée, J., P. Siguier, and M. Chandler. 2007. Insertion sequence diversity <strong>in</strong><br />
archaea. Microbiol. Mol. Biol. Rev. 71:121–157.<br />
17. Garrett, R. A., et al. 2011. <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales:<br />
complexity and diversity. Biochem. Soc. Trans. 39:51–57.
1680 GUO ET AL. J. BACTERIOL.<br />
18. Greve, B., S. Jensen, K. Brügger, W. Zillig, and R. A. Garrett. 2004. Genomic<br />
comparison of archaeal conjugative plasmids from Sulfolobus. <strong>Archaea</strong><br />
1:231–239.<br />
19. Grogan, D. W. 1989. Phenotypic characterization of <strong>the</strong> archaebacterial<br />
genus Sulfolobus: comparison of five wild-type stra<strong>in</strong>s. J. Bacteriol. 171:6710–<br />
6719.<br />
20. Gudbergsdottir, S., et al. 2011. Dynamic properties of <strong>the</strong> Sulfolobus<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vector-borne<br />
viral and plasmid genes and protospacers. Mol. Microbiol. 79:35–49.<br />
21. Held, N. L., A. Herrera, H. Cadillo-Quiroz, and R. J. Whitaker. 2010.<br />
<strong>CRISPR</strong> associated diversity with<strong>in</strong> a population of Sulfolobus islandicus.<br />
PLoS One 5:e12988.<br />
22. Jonuscheit, M., E. Martusewitsch, K. M. Stedman, and C. Schleper. 2003. A<br />
reporter gene <strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus<br />
based on a selectable and <strong>in</strong>tegrative shuttle vector. Mol. Microbiol.<br />
48:1241–1252.<br />
23. Kawarabayasi, Y., et al. 2001. Complete genome sequence of an aerobic<br />
<strong>the</strong>rmoacidophilic crenarchaeon, Sulfolobus tokodaii stra<strong>in</strong> 7. DNA Res.<br />
8:123–140.<br />
24. Kon<strong>in</strong>g, S. M., S. V. Albers, W. N. Kon<strong>in</strong>gs, and A. J. Driessen. 2002. Sugar<br />
transport <strong>in</strong> (hyper)<strong>the</strong>rmophilic archaea. Res. Microbiol. 153:61–67.<br />
25. Lillestøl, R. K., et al. 2009. <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />
Sulfolobus: bidirectional transcription and dynamic properties. Mol. Microbiol.<br />
72:259–272.<br />
26. Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improved<br />
detection of transfer RNA genes <strong>in</strong> genomic sequence. Nucleic Acids Res.<br />
25:955–964.<br />
27. Lundgren, M., A. Andersson, L. Chen, P. Nilsson, and R. Bernander. 2004.<br />
Three replication orig<strong>in</strong>s <strong>in</strong> Sulfolobus species: synchronous <strong>in</strong>itiation of<br />
chromosome replication and asynchronous term<strong>in</strong>ation. Proc. Natl. Acad.<br />
Sci. U. S. A. 101:7046–7051.<br />
28. Mahillon, J., and M. Chandler. 1998. Insertion sequences. Microbiol. Mol.<br />
Biol. Rev. 62:725–774.<br />
29. Marck, C., and H. Grosjean. 2003. Identification of BHB splic<strong>in</strong>g motifs <strong>in</strong><br />
i<strong>in</strong>tron-conta<strong>in</strong><strong>in</strong>g tRNAs from 18 archaea: evolutionary implications. RNA<br />
9:1516–1531.<br />
30. Martusewitsch, E., C. W. Sensen, and C. Schleper. 2000. High spontaneous<br />
mutation rate <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus is<br />
mediated by transposable elements. J. Bacteriol. 182:2574–2581.<br />
31. Muller, S., et al. 2009. Deficiency of <strong>the</strong> tRNA Tyr :35-synthase aPus7 <strong>in</strong><br />
<strong>Archaea</strong> of <strong>the</strong> Sulfolobales order might be rescued by <strong>the</strong> H/ACA sRNAguided<br />
mach<strong>in</strong>ery. Nucleic Acids Res. 37:1308–1322.<br />
32. Muskhelishvili, G., P. Palm, and W. Zillig. 1993. SSV1-encoded site-specific<br />
recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol. Gen. Genet. 273:334–342.<br />
33. Omer, A. D., M. Zago, A. Chang, and P. P. Dennis. 2006. Prob<strong>in</strong>g <strong>the</strong><br />
structure and function of an archaeal C/D-box methylation guide sRNA.<br />
RNA 12:1708–1720.<br />
34. Pandey, D. P., and K. Gerdes. 2005. Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly abundant<br />
<strong>in</strong> free-liv<strong>in</strong>g but lost from host-associated prokaryotes. Nucleic Acids Res.<br />
33:966–976.<br />
35. Peng, N., Q. Xia, Z. Chen, Y. X. Liang, and Q. She. 2009. An upstream<br />
activation element exert<strong>in</strong>g differential transcription activation on an<br />
archaeal promoter. Mol. Microbiol. 74:928–939.<br />
36. Peyfoon, E., et al. 2010. The S-layer glycoprote<strong>in</strong> of <strong>the</strong> crenarchaeote Sulfolobus<br />
acidocaldarius is glycosylated at multiple sites with chitobiose-l<strong>in</strong>ked<br />
N-glycans. <strong>Archaea</strong> pii:754101.<br />
37. Prangishvili, D., et al. 1998. Conjugation <strong>in</strong> archaea: frequent occurrence of<br />
conjugative plasmids <strong>in</strong> Sulfolobus. Plasmid 40:190–202.<br />
38. Prangishvili, D., P. P. Forterre, and R. A. Garrett. 2006. Viruses of <strong>the</strong><br />
<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 4:837–848.<br />
39. Prangishvili, D., et al. 2000. Sulfolobic<strong>in</strong>s, specific prote<strong>in</strong>aceous tox<strong>in</strong>s produced<br />
by stra<strong>in</strong>s of <strong>the</strong> extremely <strong>the</strong>rmophilic archaeal genus Sulfolobus. J.<br />
Bacteriol. 182:2985–2988.<br />
40. Prangishvili, D., et al. 2006. Structural and genomic properties of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
archaeal virus ATV with an extracellular stage of <strong>the</strong> reproductive<br />
cycle. J. Mol. Biol. 359:1203–1216.<br />
41. Redder, P., and R. A. Garrett. 2006. Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />
genome of Sulfolobus solfataricus P2. J. Bacteriol. 188:4198–4206.<br />
42. Redder, P., et al. 2009. Four newly isolated fuselloviruses from extreme<br />
geo<strong>the</strong>rmal environments reveal unusual morphologies and a possible <strong>in</strong>terviral<br />
recomb<strong>in</strong>ation mechanism. Environ. Microbiol. 11:2849–2862.<br />
43. Redder, P., Q. She, and R. A. Garrett. 2001. Non-autonomous elements <strong>in</strong><br />
<strong>the</strong> crenarchaeon Sulfolobus solfataricus. J. Mol. Biol. 306:1–6.<br />
44. Reno, M. L., N. L. Held, C. J. Fields, P. V. Burke, and R. J. Whitaker. 2009.<br />
Sulfolobus islandicus pan-genome. Proc. Natl. Acad. Sci. U. S. A. 106:8605–<br />
8610. (Erratum, 106:18873.)<br />
45. Rob<strong>in</strong>son, N. P., and S. D. Bell. 2007. Extrachromosomal element capture<br />
and <strong>the</strong> evolution of multiple replication orig<strong>in</strong>s <strong>in</strong> archaeal chromosomes.<br />
Proc. Natl. Acad. Sci. U. S. A. 104:5806–5811.<br />
46. Rob<strong>in</strong>son, N. P., et al. 2004. Identification of two orig<strong>in</strong>s of replication <strong>in</strong> <strong>the</strong><br />
s<strong>in</strong>gle chromosome of <strong>the</strong> archaeon Sulfolobus solfataricus. Cell 116:25–38.<br />
47. Ru<strong>the</strong>rford, K., et al. 2000. Artemis: sequence visualization and annotation.<br />
Bio<strong>in</strong>formatics 16:944–945.<br />
48. Shah, S. A., and R. A. Garrett. 2011. <strong>CRISPR</strong>/Cas and Cmr modules,<br />
mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res. Microbiol. 162:<br />
27–38.<br />
49. Shah, S. A., N. R. Hansen, and R. A. Garrett. 2009. Distributions of <strong>CRISPR</strong><br />
spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />
and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism. Trans. Biochem. Soc. 37:<br />
23–28.<br />
50. She, Q., et al. 2008. Host-vector <strong>system</strong>s for hyper<strong>the</strong>rmophilic archaeon<br />
Sulfolobus, p. 151–156. In S.-J. Liu and H. L. Drake (ed.), Microbes and <strong>the</strong><br />
environment: perspective and challenges. Science Press, Beij<strong>in</strong>g, Ch<strong>in</strong>a.<br />
51. She, Q., X. Peng, W. Zillig, and R. A. Garrett. 2001. Gene capture <strong>in</strong> archaeal<br />
chromosomes. Nature 409:478.<br />
52. She, Q., B. Shen, and L. Chen. 2004. <strong>Archaea</strong>l <strong>in</strong>tegrases and mechanisms of<br />
gene capture. Biochem. Soc. Trans. 22:222–226.<br />
53. She, Q., et al. 2001. The complete genome of <strong>the</strong> crenarchaeon Sulfolobus<br />
solfataricus P2. Proc. Natl. Acad. Sci. U. S. A. 98:7835–7840.<br />
54. She, Q., et al. 2009. Genetic analyses <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon<br />
Sulfolobus islandicus. Biochem. Soc. Trans. 37:92–96.<br />
55. Tang, T.-H., et al. 2005. Identification of novel non-cod<strong>in</strong>g RNAs as potential<br />
antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus. Mol. Microbiol.<br />
55:469–481.<br />
56. Torar<strong>in</strong>sson, E., H.-P. Klenk, and R. A. Garrett. 2005. Divergent transcriptional<br />
and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7:47–54.<br />
57. Wagner, M., et al. 2009. Expand<strong>in</strong>g and understand<strong>in</strong>g <strong>the</strong> genetic toolbox of<br />
<strong>the</strong> hyper<strong>the</strong>rmophilic genus Sulfolobus. Biochem. Soc. Trans. 37:97–101.<br />
58. Worth<strong>in</strong>gton, P., V. Hoang, F. Perez-Pomares, and P. Blum. 2003. Targeted<br />
disruption of <strong>the</strong> alpha-amylase gene <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon<br />
Sulfolobus solfataricus. J. Bacteriol. 185:482–488.<br />
59. Wurtzel, O., et al. 2010. A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />
Genome Res. 20:133–141.<br />
60. Yokobori, S., et al. 2009. Ga<strong>in</strong> and loss of an <strong>in</strong>tron <strong>in</strong> a prote<strong>in</strong>-cod<strong>in</strong>g gene<br />
<strong>in</strong> <strong>Archaea</strong>: <strong>the</strong> case of an archaeal RNA pseudourid<strong>in</strong>e synthase gene. BMC<br />
Evol. Biol. 9:198.<br />
61. Zago, M. A., P. P. Dennis, and A. D. Omer. 2005. The expand<strong>in</strong>g world of<br />
small RNAs <strong>in</strong> <strong>the</strong> hyper<strong>the</strong>rmophilic archaeon Sulfolobus solfataricus. Mol.<br />
Microbiol. 55:1812–1828.<br />
62. Zhang, C., et al. 2010. Reveal<strong>in</strong>g <strong>the</strong> essentiality of multiple archaeal pcna<br />
genes us<strong>in</strong>g a mutant propagation assay based on an improved knockout<br />
method. Microbiology 156:3386–3397.<br />
63. Zillig, W., et al. 1998. Genetic elements <strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic<br />
archaeon Sulfolobus. Extremophiles2:131–140.
Biochemical Society Transactions www.biochemsoctrans.org<br />
<strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong><br />
Sulfolobales: complexity and diversity<br />
Roger A. Garrett 1 ,ShirazA.Shah,GisleVestergaard,L<strong>in</strong>gDeng,SoleyGudbergsdottir,ChandraS.Kenchappa,<br />
Susanne Erdmann and Qunx<strong>in</strong> She<br />
<strong>Archaea</strong> Centre, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200N Copenhagen K, Denmark<br />
Abstract<br />
<strong>CRISPR</strong> (cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats)/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s of Sulfolobus,<br />
target<strong>in</strong>g DNA and RNA respectively of <strong>in</strong>vad<strong>in</strong>g viruses or plasmids are complex and diverse. We address<br />
<strong>the</strong>ir classification and functional diversity, and <strong>the</strong> wide sequence diversity of RAMP (repeat-associated<br />
mysterious prote<strong>in</strong>)-motif conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s encoded <strong>in</strong> Cmr modules. Factors <strong>in</strong>fluenc<strong>in</strong>g ma<strong>in</strong>tenance<br />
of partially impaired <strong>CRISPR</strong>-based <strong>system</strong>s are discussed. The capacity for whole <strong>CRISPR</strong> transcripts to be<br />
generated despite <strong>the</strong> uptake of transcription signals with<strong>in</strong> spacer sequences is considered. Target<strong>in</strong>g of<br />
protospacer regions of <strong>in</strong>vad<strong>in</strong>g elements by Cas prote<strong>in</strong>–crRNA (<strong>CRISPR</strong> RNA) complexes exhibit relatively<br />
low sequence str<strong>in</strong>gency, but <strong>the</strong> <strong>in</strong>tegrity of protospacer-associated motifs appears to be important.<br />
Different mechanisms for circumvent<strong>in</strong>g or <strong>in</strong>activat<strong>in</strong>g <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are presented.<br />
Introduction<br />
The discovery of <strong>the</strong> widespread occurrence of <strong>CRISPR</strong><br />
(cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeat)-based<br />
<strong>immune</strong> <strong>system</strong>s <strong>in</strong> archaea and bacteria has provided<br />
important <strong>in</strong>sights <strong>in</strong>to how hosts can <strong>in</strong>activate and<br />
or regulate <strong>in</strong>vad<strong>in</strong>g foreign DNA and, probably, RNA<br />
genetic elements. In addition, <strong>the</strong>se <strong>system</strong>s are likely to<br />
<strong>in</strong>fluence how co-<strong>in</strong>vad<strong>in</strong>g genetic elements can <strong>in</strong>fluence one<br />
ano<strong>the</strong>r [1,2]. The two ma<strong>in</strong> molecular apparatus <strong>in</strong>volved<br />
are structurally complex, partially <strong>in</strong>dependent and have<br />
diversified functionally. Moreover, <strong>the</strong>ir capacity to facilitate<br />
<strong>the</strong> cont<strong>in</strong>ual uptake of foreign DNA <strong>in</strong>to host chromosomes,<br />
and <strong>the</strong>ir propensity for transfer between organisms, has<br />
important implications for cellular evolution.<br />
The genus Sulfolobus provides an important model <strong>system</strong><br />
for study<strong>in</strong>g <strong>the</strong>se <strong>immune</strong> <strong>system</strong>s. Most Sulfolobus species<br />
carry complex and diverse <strong>CRISPR</strong>-based <strong>system</strong>s and appear<br />
to be particularly active <strong>in</strong> <strong>the</strong> uptake of foreign DNA <strong>in</strong>serts<br />
<strong>in</strong>to <strong>the</strong>ir <strong>CRISPR</strong> loci. Fur<strong>the</strong>rmore, a broad collection<br />
of Sulfolobus genetic elements is available that can be used<br />
to challenge <strong>the</strong> <strong>CRISPR</strong>-based <strong>system</strong>s [3]. It <strong>in</strong>cludes<br />
numerous diverse viruses many of which have been classified<br />
<strong>in</strong>to eight new viral families [4,5] as well as a family of<br />
plasmids encod<strong>in</strong>g an archaeal-specific conjugative apparatus<br />
[6,7].<br />
Many <strong>in</strong>sights <strong>in</strong>to <strong>the</strong> complexity of <strong>the</strong> <strong>CRISPR</strong>based<br />
<strong>immune</strong> <strong>system</strong>s, and <strong>the</strong>ir mechanistic diversity,<br />
have emerged from detailed experimental studies of CR-<br />
ISPR/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s of <strong>the</strong> archaeal genera<br />
Key words: archaeal virus, cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats/Cas module<br />
(<strong>CRISPR</strong>/Cas module), Cmr module, <strong>CRISPR</strong> RNA (crRNA), protospacer-associated motif (PAM).<br />
Abbreviations used: <strong>CRISPR</strong>, cluster of regularly <strong>in</strong>terspaced pal<strong>in</strong>dromic repeats; crRNA,<br />
<strong>CRISPR</strong> RNA; IS, <strong>in</strong>sertion sequence; PAM, protospacer-associated motif; RAMP, repeat-associated<br />
mysterious prote<strong>in</strong>; SIRV1, Sulfolobus islandicus rod-shaped virus 1.<br />
1 To whom correspondence should be addressed (email garrett@bio.ku.dk).<br />
Biochem. Soc. Trans. (2011) 39, 51–57; doi:10.1042/BST0390051<br />
Molecular Biology of <strong>Archaea</strong> II 51<br />
Sulfolobus and Pyrococcus respectively, and from <strong>in</strong>vestigation<br />
of bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s of Streptococcus<br />
<strong>the</strong>rmophilus [8,9], Staphylococcus epidermidis [10,11] and<br />
Escherichia coli [12]. In <strong>the</strong> present article, we focus primarily<br />
on current knowledge and ideas deriv<strong>in</strong>g from, and relat<strong>in</strong>g<br />
to, <strong>the</strong> Sulfolobus <strong>immune</strong> <strong>system</strong>s.<br />
<strong>CRISPR</strong>/Cas families: complexity,<br />
classification and versatility<br />
At an early stage, it was clear that <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />
Cmr <strong>system</strong>s were highly complex when approx. 45 different<br />
prote<strong>in</strong>s were implicated <strong>in</strong> <strong>the</strong>ir function [13], and <strong>the</strong><br />
number has cont<strong>in</strong>ued to rise [14]. Genes of <strong>the</strong> two <strong>system</strong>s<br />
are clustered <strong>in</strong>to cas and cmr cassettes which are sometimes<br />
l<strong>in</strong>ked physically. These cassettes encode a few core prote<strong>in</strong>s,<br />
but <strong>the</strong>y also carry different comb<strong>in</strong>ations of o<strong>the</strong>r genes,<br />
some occurr<strong>in</strong>g more commonly than o<strong>the</strong>rs. Thus cassettes<br />
vary markedly <strong>in</strong> <strong>the</strong>ir overall gene contents. To illustrate this,<br />
core gene structures of <strong>the</strong> archaeal cas cassettes are shown<br />
toge<strong>the</strong>r with a more complex family I cas cassette from<br />
Sulfolobus islandicus HVE10/4 (Figures 1A and 1B). The core<br />
cas genes classify <strong>in</strong>to cas group 1, implicated <strong>in</strong> <strong>CRISPR</strong><br />
acquisition of foreign DNA and <strong>in</strong>sertion <strong>in</strong>to <strong>CRISPR</strong> loci,<br />
and cas group 2 associated with crRNA (<strong>CRISPR</strong> RNA)<br />
process<strong>in</strong>g and guidance (Figure 1A).<br />
Families of <strong>CRISPR</strong>/Cas modules have been classified<br />
on <strong>the</strong> basis of gene content and gene order with<strong>in</strong> cas<br />
cassettes, and on <strong>the</strong> basis of conserved sequences of cas genes,<br />
leader regions and repeats with<strong>in</strong> <strong>CRISPR</strong>/Cas modules. For<br />
archaea, about eight families have been proposed, whereas<br />
among <strong>the</strong> Sulfolobales, three are common (I–III) and one<br />
less so (IV) [2,15,16,17].<br />
Cmr modules carry two conserved core genes, cmr2<br />
and cmr5 (Figure 2A), and a variable number of genes<br />
C○The Authors Journal compilation C○2011 Biochemical Society
52 Biochemical Society Transactions (2011) Volume 39, part 1<br />
Figure 1 Core genes of archaeal cas cassettes<br />
(A) Core genes are divided <strong>in</strong>to putative functional cas groups 1 and 2 (see <strong>the</strong> text) and <strong>the</strong> cas6 gene, which encodes<br />
an RNA-process<strong>in</strong>g enzyme [18]. (B) Genetic map of a family I <strong>CRISPR</strong>/Cas module of S. islandicus stra<strong>in</strong> HVE10/4 carry<strong>in</strong>g<br />
several non-core cas genes.<br />
encod<strong>in</strong>g diverse prote<strong>in</strong>s which carry RAMP (repeatassociated<br />
mysterious prote<strong>in</strong>) motifs. The Cmr modules<br />
can be classified <strong>in</strong>to five ma<strong>in</strong> families A, B, C, D and E<br />
for archaea on <strong>the</strong> basis of phylogenetic tree build<strong>in</strong>g us<strong>in</strong>g<br />
sequences of Cmr2 and its homologues Csm1 and Csx11<br />
(Figure 2B), where most Sulfolobus Cmr modules fall with<strong>in</strong><br />
families B or D. Fur<strong>the</strong>r classification is complicated by<br />
<strong>the</strong> presence of multiple diverse copies of genes cod<strong>in</strong>g for<br />
RAMP-motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s. Although <strong>the</strong>se prote<strong>in</strong>s<br />
can be classified <strong>in</strong>to families on <strong>the</strong> basis of <strong>the</strong>se motifs,<br />
<strong>the</strong> rema<strong>in</strong>der of <strong>the</strong> prote<strong>in</strong> sequences tend to be highly<br />
divergent, as illustrated for four prote<strong>in</strong>s encoded <strong>in</strong> a Cmr<br />
family B module of Sulfolobus solfataricus P2 (Figure 2C).<br />
Most Sulfolobus species carry multiple <strong>CRISPR</strong>/Cas<br />
and/or Cmr modules and, given <strong>the</strong> high energy cost of<br />
ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g and express<strong>in</strong>g <strong>the</strong>m, <strong>the</strong>y must confer major<br />
advantages on to <strong>the</strong> cell. Clearly, given <strong>the</strong> molecular<br />
and mechanistic complexities of <strong>the</strong> <strong>system</strong>s, <strong>the</strong>y can be<br />
<strong>in</strong>activated readily by <strong>in</strong>curr<strong>in</strong>g a defect <strong>in</strong> a component or<br />
critical sequence motif. Moreover, <strong>the</strong> <strong>system</strong>s are potential<br />
targets for <strong>in</strong>com<strong>in</strong>g genetic elements which may attempt<br />
to <strong>in</strong>tegrate <strong>in</strong>to essential cas or cmr genes as has been<br />
observed for a viral <strong>in</strong>tegration <strong>in</strong> a csa3 gene of S. islandicus<br />
stra<strong>in</strong> M.16.4 (see below) or modify <strong>the</strong>ir prote<strong>in</strong> products<br />
or o<strong>the</strong>rwise <strong>in</strong>terfere with transcription or maturation of<br />
crRNAs. Therefore multiple <strong>system</strong>s will provide added security<br />
aga<strong>in</strong>st unwanted <strong>in</strong>vasion. The pair<strong>in</strong>g of many family<br />
I <strong>CRISPR</strong>/Cas modules may reflect a compromise between<br />
provid<strong>in</strong>g added security and generat<strong>in</strong>g more compact and<br />
efficient <strong>system</strong>s which can potentially be mobilized and<br />
transferred between organisms as s<strong>in</strong>gle units [2].<br />
A fur<strong>the</strong>r advantage may arise from <strong>the</strong> presence<br />
of different families of <strong>CRISPR</strong>/Cas modules which is<br />
commonly observed for Sulfolobus (e.g. S. solfataricus carries<br />
family I and II modules, whereas Sulfolobus acidocaldarius<br />
carries those of family II and III) [16]. Their presence may<br />
<strong>in</strong>crease versatility <strong>in</strong> both <strong>the</strong> uptake of spacers and target<strong>in</strong>g<br />
of protospacers with different PAMs (protospacer-associated<br />
motifs).<br />
The presence of multiple Cmr modules is also likely to<br />
confer functional versatility, although <strong>the</strong>y are subject to <strong>the</strong><br />
constra<strong>in</strong>t that some encoded prote<strong>in</strong>s must be able to<br />
recognize part of <strong>the</strong> repeat sequence of <strong>the</strong> co-<strong>in</strong>habit<strong>in</strong>g<br />
<strong>CRISPR</strong>/Cas module [18,19]. Cmr modules are sometimes<br />
l<strong>in</strong>ked directly to <strong>CRISPR</strong>/Cas modules on chromosomes<br />
C○The Authors Journal compilation C○2011 Biochemical Society<br />
and, given <strong>the</strong>ir functional <strong>in</strong>terdependence, <strong>the</strong>re is likely<br />
to have been some co-evolution of <strong>the</strong> coupled <strong>system</strong>s.<br />
Consistent with this view, analysis of <strong>the</strong> Sulfolobales<br />
suggests that Cmr family D modules (Figure 2B) are<br />
commonly, but not exclusively, found toge<strong>the</strong>r with family<br />
II <strong>CRISPR</strong>/Cas modules.<br />
<strong>CRISPR</strong> loci: structural and functional<br />
complexity<br />
<strong>CRISPR</strong> loci consist of regularly spaced direct repeat<br />
sequences with <strong>in</strong>terven<strong>in</strong>g spacers deriv<strong>in</strong>g from <strong>in</strong>vad<strong>in</strong>g<br />
foreign DNA elements. <strong>Archaea</strong>l repeats fall <strong>in</strong> <strong>the</strong> size<br />
range 23–37 bp and most spacers are 25–50 bp long [20].<br />
<strong>CRISPR</strong> loci are preceded by a leader region which varies<br />
<strong>in</strong> size from approx. 150 to 550 bp and shows levels of<br />
sequence conservation which are only considered significant<br />
with<strong>in</strong> specific families of <strong>CRISPR</strong>/Cas modules. <strong>CRISPR</strong><br />
locus sizes can also vary considerably, suggest<strong>in</strong>g that rates<br />
of spacer turnover differ markedly for different <strong>CRISPR</strong><br />
loci with<strong>in</strong> a given archaeon. But <strong>the</strong>re is no support for<br />
differences occurr<strong>in</strong>g between <strong>the</strong> <strong>CRISPR</strong>/Cas families of<br />
<strong>the</strong> Sulfolobales, s<strong>in</strong>ce large and small clusters exist for <strong>the</strong><br />
most common families I, II and III.<br />
In organisms carry<strong>in</strong>g several <strong>CRISPR</strong>/Cas modules,<br />
<strong>in</strong>clud<strong>in</strong>g S. solfataricus stra<strong>in</strong>s P1 and P2 with six, and S.<br />
acidocaldarius with five, <strong>the</strong>y may not all be fully functional.<br />
The <strong>CRISPR</strong>/Cas <strong>system</strong> exhibits two partially <strong>in</strong>dependent<br />
functions with one group of Cas prote<strong>in</strong>s responsible for<br />
uptake of <strong>in</strong>vader DNA <strong>in</strong>to <strong>CRISPR</strong> loci and <strong>the</strong> o<strong>the</strong>r<br />
for generat<strong>in</strong>g crRNAs and guid<strong>in</strong>g <strong>the</strong>m to <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g<br />
genetic element (Figure 1). Only <strong>the</strong> latter prote<strong>in</strong>s are<br />
essential for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> to function. Thus nonextend<strong>in</strong>g<br />
<strong>CRISPR</strong> loci may still be useful to cells as long<br />
crRNAs are generated. S. acidocaldarius carries two large<br />
loci and three smaller ones of 11, five and two spacerrepeat<br />
units. All five clusters were transcribed and processed<br />
to mature crRNAs [16], but possibly <strong>the</strong> spacer addition<br />
functions are defective for <strong>the</strong> small clusters. Similarly, for<br />
S. solfataricus P1 and P2, of <strong>the</strong> six <strong>CRISPR</strong> loci, only four<br />
appear to be active <strong>in</strong> elongation. Of <strong>the</strong> o<strong>the</strong>r two, <strong>the</strong><br />
smallest (locus E) carries six spacer-repeat units with a leader<br />
and no cas genes [16] and does not appear to be transcribed<br />
[21]. It carries spacers match<strong>in</strong>g rudiviruses and a conjugative<br />
plasmid and is conserved <strong>in</strong> three S. solfataricus stra<strong>in</strong>s (two
Figure 2 Classification of archaeal Cmr modules<br />
(A) Gene map of an archaeal Cmr module show<strong>in</strong>g <strong>the</strong> conserved core prote<strong>in</strong>s Cmr2 and Cmr5, and <strong>the</strong> grey boxes represent<br />
genes encod<strong>in</strong>g different prote<strong>in</strong>s which carry RAMP motifs. (B) Phylogenetic tree for archaeal Cmr modules based on <strong>the</strong><br />
Cmr2 prote<strong>in</strong> sequence show<strong>in</strong>g five ma<strong>in</strong> families: A, B, C, D and E. The total number of different prote<strong>in</strong>s <strong>in</strong> each family<br />
carry<strong>in</strong>g RAMP motifs is given <strong>in</strong> paren<strong>the</strong>ses. Trees were prepared us<strong>in</strong>g <strong>the</strong> MUSCLE and ClustalW programs as described<br />
previously [17]. (C) MapsoffourRAMPmotif-conta<strong>in</strong><strong>in</strong>gprote<strong>in</strong>swith<strong>in</strong>as<strong>in</strong>gleCmrfamilyBmoduleofS. solfataricus P2.<br />
They illustrate <strong>the</strong> diverse locations of <strong>the</strong> two conserved am<strong>in</strong>o acid sequence regions (1 and 2), determ<strong>in</strong>ed us<strong>in</strong>g <strong>the</strong><br />
MEME program [45]. The rema<strong>in</strong><strong>in</strong>g sequence regions show very low levels of sequence similarity.<br />
Molecular Biology of <strong>Archaea</strong> II 53<br />
C○The Authors Journal compilation C○2011 Biochemical Society
54 Biochemical Society Transactions (2011) Volume 39, part 1<br />
Figure 3 A map of <strong>the</strong> <strong>CRISPR</strong> locus E<br />
Locus E is found <strong>in</strong> S. solfataricus stra<strong>in</strong>s P1, P2 and 98/2 and <strong>the</strong><br />
S. islandicus stra<strong>in</strong> L.D.8.5 [22]. Triangles represent spacer-repeat units<br />
that are colour-coded for match<strong>in</strong>g sequences: red, rudivirus and blue,<br />
conjugative plasmid. The shaded spacer-repeat units carry identical<br />
sequences. L represents <strong>the</strong> leader region. The 36 kb genomic region<br />
flank<strong>in</strong>g <strong>the</strong> locus (grey region) is conserved at >99% sequence identity<br />
<strong>in</strong> all four stra<strong>in</strong>s.<br />
from Naples, Italy) with only <strong>the</strong> f<strong>in</strong>al downstream spacer<br />
differ<strong>in</strong>g between <strong>the</strong> P1/P2 stra<strong>in</strong>s and stra<strong>in</strong> 98/2 (Figure 3).<br />
Moreover, it is also found on a highly conserved 36 kb<br />
chromosomal fragment (99% sequence identity) <strong>in</strong> <strong>the</strong> S.<br />
islandicus stra<strong>in</strong> L.D.8.5 (from Lassen, CA, U.S.A.) [22],<br />
with an almost identical leader region (one mismatch) and<br />
identical repeat sequence but different spacers (Figure 3). The<br />
ma<strong>in</strong>tenance and spread<strong>in</strong>g of locus E, lack<strong>in</strong>g a cas cassette,<br />
would suggest that <strong>the</strong> <strong>CRISPR</strong> module can be activated and<br />
generate crRNAs. The <strong>in</strong>ference that Cas prote<strong>in</strong>s encoded<br />
<strong>in</strong> one <strong>CRISPR</strong>/Cas module can activate o<strong>the</strong>r <strong>CRISPR</strong> loci<br />
would also be consistent with <strong>the</strong> <strong>in</strong>ference that <strong>the</strong> group<br />
1 cas genes (Figure 1A) can exchange between <strong>CRISPR</strong>/Cas<br />
modules [2].<br />
The large <strong>in</strong>active locus F with 88 spacer-repeat units,<br />
is completely conserved <strong>in</strong> sequence between S. solfataricus<br />
stra<strong>in</strong>s P1 and P2, but it lacks a leader region, and, although<br />
transcription occurs <strong>in</strong>ternally with<strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> locus,<br />
mature crRNAs are not generated [21,23]. Thus <strong>the</strong> latter,<br />
which has been lost from S. solfataricus stra<strong>in</strong> 98/2, may be<br />
of little use when a viral <strong>in</strong>fection occurred.<br />
Generally for Sulfolobus species, loss of mobile DNA<br />
elements is difficult, thus IS (<strong>in</strong>sertion sequence) elements<br />
tend to degenerate ra<strong>the</strong>r than be deleted [24], and<br />
this may also apply to <strong>CRISPR</strong>/Cas and Cmr modules,<br />
and expla<strong>in</strong> <strong>the</strong> ma<strong>in</strong>tenance of defective <strong>CRISPR</strong> <strong>system</strong>s<br />
over long periods, although <strong>in</strong> a variant stra<strong>in</strong> of S. solfataricus<br />
P2 (P2A), four physically l<strong>in</strong>ked <strong>CRISPR</strong>/Cas modules (A–<br />
D) were apparently lost via a s<strong>in</strong>gle recomb<strong>in</strong>ation event<br />
between border<strong>in</strong>g IS elements [25].<br />
Transcription of <strong>CRISPR</strong> loci and process<strong>in</strong>g<br />
Processed <strong>CRISPR</strong> transcripts were first observed for<br />
<strong>the</strong> euryarchaeon Archaeoglobus fulgidus and crenarchaeon<br />
S. solfataricus, and <strong>the</strong>se studies revealed <strong>the</strong> regular pattern<br />
of <strong>the</strong> RNA process<strong>in</strong>g, us<strong>in</strong>g probes specific for repeat<br />
sequences [26,27]. Subsequently, <strong>the</strong> smallest Sulfolobus<br />
RNA product of approx. 40 bp was identified cover<strong>in</strong>g<br />
primarily a s<strong>in</strong>gle spacer sequence [20]. S. acidocaldarius<br />
<strong>CRISPR</strong> loci are transcribed upstream from <strong>the</strong> first repeat<br />
with<strong>in</strong> <strong>the</strong> leader region and term<strong>in</strong>ation occurs downstream<br />
C○The Authors Journal compilation C○2011 Biochemical Society<br />
from <strong>the</strong> f<strong>in</strong>al repeat. Even for <strong>the</strong> locus carry<strong>in</strong>g 78 spacerrepeat<br />
units (4930 bp), a substantial proportion of transcripts<br />
were approx. 5000 nt long with ano<strong>the</strong>r large portion <strong>in</strong> <strong>the</strong><br />
size range 3000–3500 nt [16].<br />
This raised an important question as to how transcription<br />
cont<strong>in</strong>ues throughout <strong>CRISPR</strong> loci apparently unimpeded by<br />
<strong>the</strong> presence of spacers carry<strong>in</strong>g archaea-specific promoter or<br />
term<strong>in</strong>ator motifs, given that <strong>the</strong> DNA uptake mechanism<br />
is essentially statistically random [15]. A compilation of<br />
potential promoter and term<strong>in</strong>ator motifs on <strong>the</strong> leader<br />
(crRNA) strand of <strong>the</strong> available Sulfolobus genomes revealed,<br />
for a total of 4505 spacers, 2560 carry<strong>in</strong>g archaeal-type<br />
hexameric TATA boxes (at least six consecutive A and Ts<br />
with at least two As) and 725 with T-rich pyrimid<strong>in</strong>e motifs<br />
(at least six consecutive T and Cs with at least five Ts) [28,29].<br />
Although many of <strong>the</strong>se may at best be weakly effective,<br />
never<strong>the</strong>less, given <strong>the</strong> high gene density <strong>in</strong> <strong>the</strong> Sulfolobus<br />
viral and plasmid genomes and <strong>the</strong> low frequency of operon<br />
structures, <strong>the</strong> probability of tak<strong>in</strong>g up such active motifs is<br />
significant. The conclusion that transcripts do not normally<br />
start with<strong>in</strong> <strong>CRISPR</strong> loci is also supported by exam<strong>in</strong>ation<br />
of <strong>CRISPR</strong> transcripts from S. solfataricus P2 transcriptome<br />
data [21], which <strong>in</strong>dicate that most of <strong>the</strong> detectable 5 ′ -ends<br />
are attributable to process<strong>in</strong>g sites with<strong>in</strong> repeats [21]. A<br />
possible explanation for <strong>the</strong> unimpeded transcription through<br />
<strong>the</strong> <strong>CRISPR</strong> loci could be <strong>the</strong> presence of <strong>the</strong> <strong>CRISPR</strong>b<strong>in</strong>d<strong>in</strong>g<br />
prote<strong>in</strong> of Sulfolobus and o<strong>the</strong>r crenarchaea [30]; it<br />
could act as a transcription factor <strong>in</strong>hibit<strong>in</strong>g transcriptional<br />
starts and stops with<strong>in</strong> <strong>the</strong> spacer sequences, and repeats.<br />
Full-length transcripts are also produced from <strong>the</strong> opposite<br />
DNA strand of <strong>CRISPR</strong> loci of S. acidocaldarius which yield<br />
discrete 50–60 bp fragments carry<strong>in</strong>g spacer sequences, albeit<br />
at lower molar levels than for <strong>the</strong> crRNAs [16], and antisense<br />
RNA transcripts also were detected for <strong>CRISPR</strong> loci of<br />
S. solfataricus P2 [21]. Failure to detect similar transcripts<br />
<strong>in</strong> <strong>the</strong> euryarchaeon Pyrococcus and bacterium E. coli [12,19]<br />
suggests that this may be a specific property of Sulfolobus<br />
or crenarchaea. Analyses of cDNA libraries of S. solfataricus<br />
demonstrated previously that antisense RNAs are commonly<br />
produced especially aga<strong>in</strong>st transposase mRNAs [27], and<br />
several o<strong>the</strong>r antisense RNAs have been detected for this<br />
organism [21]. Given that mature crRNAs are produced <strong>in</strong> <strong>the</strong><br />
absence of <strong>in</strong>fect<strong>in</strong>g genetic elements <strong>in</strong> different Sulfolobus<br />
species [16,20,23], one possible explanation is that <strong>the</strong>se<br />
antisense RNAs protect at least a fraction of <strong>the</strong> crRNAs<br />
aga<strong>in</strong>st degradation before <strong>the</strong>ir activation.<br />
Maturation of crRNAs and str<strong>in</strong>gency of<br />
target<strong>in</strong>g mechanisms<br />
Details of RNA-process<strong>in</strong>g mechanism have been elucidated<br />
for a euryarchaeal <strong>CRISPR</strong>/Cmr <strong>system</strong> and an E. coli<br />
<strong>CRISPR</strong>/Cas <strong>system</strong> where Cas6 homologues cut <strong>in</strong> <strong>the</strong><br />
repeat, 8 nt 5 ′ from <strong>the</strong> start of <strong>the</strong> spacer sequence, whereas<br />
<strong>the</strong> 3 ′ -process<strong>in</strong>g sites differ [12,18]. For S. solfataricus,many<br />
5 ′ -ends, and putative process<strong>in</strong>g sites, are detectable 6–8 nt<br />
from <strong>the</strong> spacer start [21], suggest<strong>in</strong>g that a similar mechanism
operates. Process<strong>in</strong>g at <strong>the</strong> 3 ′ -end of <strong>the</strong> crRNA is less clearly<br />
def<strong>in</strong>ed, but for <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> of Pyrococcus, a<br />
14 nt ruler mechanism enables <strong>the</strong> process<strong>in</strong>g ribonuclease to<br />
generate dual cuts at 5 and 11 nt <strong>in</strong>to <strong>the</strong> spacer sequence<br />
[31]. Presumably, crRNA-b<strong>in</strong>d<strong>in</strong>g Cas and Cmr prote<strong>in</strong>s<br />
dist<strong>in</strong>guish between <strong>the</strong> different crRNA products before<br />
target<strong>in</strong>g <strong>the</strong> foreign DNA or RNA respectively.<br />
Until recently, attention focused on target<strong>in</strong>g of doublestranded<br />
DNA elements, but probably s<strong>in</strong>gle-stranded DNA<br />
will also be targeted by <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>. It rema<strong>in</strong>s<br />
an open question whe<strong>the</strong>r <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> targets<br />
both mRNA and viral RNA, and <strong>in</strong>corporation of viral RNA<br />
<strong>in</strong>to <strong>CRISPR</strong> loci would require reverse transcriptase activity.<br />
Never<strong>the</strong>less, all evidence suggests that <strong>the</strong> primary targets<br />
of <strong>the</strong> Sulfolobus <strong>immune</strong> <strong>system</strong>s are viruses and plasmids<br />
and, probably, <strong>the</strong>ir mRNAs. There is no support for a<br />
general target<strong>in</strong>g of transposable elements. Spacers match<strong>in</strong>g<br />
transposase genes are occasionally found <strong>in</strong> <strong>CRISPR</strong> loci<br />
[16,20,32], but <strong>the</strong>y can generally be attributed to transposase<br />
genes present <strong>in</strong> viruses or plasmids, <strong>in</strong> particular orphan orfB<br />
elements (family IS605/200) for Sulfolobus [2,15].<br />
Effective target<strong>in</strong>g of genetic elements requires that <strong>the</strong><br />
mature crRNA anneals to <strong>the</strong> protospacer DNA region.<br />
Although, for <strong>the</strong> bacterium S. <strong>the</strong>rmophilus, a perfect<br />
sequence match was required to elicit a response from <strong>the</strong> CR-<br />
ISPR/Cas <strong>system</strong> [9], studies on different Sulfolobus stra<strong>in</strong>s<br />
have shown that a less str<strong>in</strong>gent recognition <strong>system</strong> prevails.<br />
Challeng<strong>in</strong>g Sulfolobus cells with viral genes carry<strong>in</strong>g one<br />
to three mismatches still produced a strong response from<br />
<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> [23]. Ano<strong>the</strong>r important factor is<br />
<strong>the</strong> motif known as PAM. Targeted genetic elements carry<br />
this short sequence motif which creates a mismatch with<br />
<strong>the</strong> 5 ′ -end of <strong>the</strong> crRNA [16,33,34]. For Sulfolobus, this was<br />
def<strong>in</strong>ed as a family-specific d<strong>in</strong>ucleotide, displaced 1 nt from<br />
<strong>the</strong> spacer sequence [15,16]. Potentially, this can be <strong>in</strong>volved<br />
<strong>in</strong> both selection of protospacers for excision by Cas prote<strong>in</strong>s<br />
and crRNA target<strong>in</strong>g. Whereas a study of <strong>the</strong> bacterium<br />
S. epidermidis concluded that <strong>the</strong> PAM was not important for<br />
protospacer target<strong>in</strong>g and that any mismatched base pair<strong>in</strong>g<br />
would suffice [11], for S. islandicus stra<strong>in</strong> REY15A, alter<strong>in</strong>g<br />
<strong>the</strong> PAM led to a loss of crRNA target<strong>in</strong>g [23].<br />
Anti-immmune <strong>system</strong>s<br />
Although a few archaeal viruses have been shown to be<br />
lytic and to elicit strong <strong>immune</strong> responses, many Sulfolobus<br />
viruses and plasmids coexist <strong>in</strong> a stable relationship, at low<br />
copy numbers, over longer periods. Although <strong>the</strong>se genetic<br />
elements do not appear to be targeted by <strong>the</strong> host <strong>CRISPR</strong><br />
<strong>system</strong>s, <strong>the</strong> latter could never<strong>the</strong>less have a regulatory role<br />
possibly by target<strong>in</strong>g mRNAs.<br />
Ano<strong>the</strong>r special feature of archaeal genetic elements is<br />
that <strong>the</strong>y often carry an <strong>in</strong>tegrase gene which partitions<br />
on chromosomal <strong>in</strong>tegration. Consequently, <strong>the</strong> <strong>in</strong>tegrated<br />
element can only be excised when <strong>the</strong> free element is<br />
present to generate an <strong>in</strong>tact <strong>in</strong>tegrase/excision enzyme [35].<br />
Molecular Biology of <strong>Archaea</strong> II 55<br />
Thus target<strong>in</strong>g and degradation of <strong>the</strong> free genetic element<br />
by <strong>the</strong> host <strong>CRISPR</strong>/Cas <strong>system</strong> could actually favour<br />
entrapment of <strong>the</strong> <strong>in</strong>tegrated element, and such a process<br />
could enhance viral and plasmid evolution <strong>in</strong> archaea. The<br />
Redder Model [36] for archaeal viral evolution hypo<strong>the</strong>sized<br />
that, s<strong>in</strong>ce more than one type of fusellovirus can <strong>in</strong>tegrate<br />
at a given att site with<strong>in</strong> a tRNA gene, <strong>the</strong> encaptured<br />
concatenated viruses would tend to recomb<strong>in</strong>e <strong>the</strong>reby<br />
generat<strong>in</strong>g, and subsequently releas<strong>in</strong>g, hybrid fuselloviruses<br />
[36]. A similar process may occur for Sulfolobus-specific<br />
conjugative plasmids. They are also <strong>in</strong>tegrative, and <strong>the</strong>ir<br />
DNA is regularly <strong>in</strong>corporated <strong>in</strong>to <strong>CRISPR</strong> loci as<br />
spacers [16,20]. Moreover, this could expla<strong>in</strong> why some of<br />
<strong>the</strong> different Icelandic conjugative plasmids cultivated <strong>in</strong><br />
Wolfram Zillig’s laboratory [37] often carry large regions of<br />
almost identical nucleotide sequence [6,7]. Thus, <strong>in</strong>directly,<br />
<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s could be fuell<strong>in</strong>g production of<br />
new viral and plasmid variants which <strong>the</strong>y may subsequently<br />
be required to <strong>in</strong>activate.<br />
Some <strong>in</strong>sights <strong>in</strong>to how genetic elements underm<strong>in</strong>e or<br />
avoid <strong>the</strong> <strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s were ga<strong>in</strong>ed by pass<strong>in</strong>g<br />
<strong>the</strong> rudivirus SIRV1 (Sulfolobus islandicus rod-shaped virus<br />
1) through a series of closely related S. islandicus stra<strong>in</strong>s.<br />
This generated many sequence changes <strong>in</strong> <strong>the</strong> viral genes,<br />
but strik<strong>in</strong>g was <strong>the</strong> frequent occurrence of genes that were<br />
altered by 12 bp <strong>in</strong>dels, probably deletions [38]. When similar<br />
12 bp <strong>in</strong>dels were observed among related lipothrixviruses,<br />
it was <strong>in</strong>ferred that <strong>the</strong>se might occur at crRNA-target<strong>in</strong>g<br />
protospacers on <strong>the</strong> viral genomes [39]. In ano<strong>the</strong>r study of a<br />
hyper<strong>the</strong>rmophilic archaeal virus, HAV1 (hyper<strong>the</strong>rmophilic<br />
archaeal virus 1), cultured <strong>in</strong> a bioreactor over a 2-year period,<br />
samples taken at different times showed genome sequence<br />
changes, not unlike those observed earlier for SIRV1, but also<br />
a series of recomb<strong>in</strong>ation sites were detected along <strong>the</strong> l<strong>in</strong>ear<br />
genome at which frequent rearrangements had occurred to<br />
generate viral variants with altered sequences [40].<br />
Although accumulat<strong>in</strong>g specific sequence changes <strong>in</strong><br />
genetic elements is an effective way of avoid<strong>in</strong>g, at least<br />
temporarily, crRNA target<strong>in</strong>g, more direct methods must<br />
also have evolved. Thus, for <strong>the</strong> S. islandicus stra<strong>in</strong> M.16.4,<br />
an M164 provirus 1 has <strong>in</strong>serted <strong>in</strong>to, and disrupted, <strong>the</strong><br />
csa3 gene considered to encode <strong>the</strong> transcriptional regulator<br />
of <strong>the</strong> group 1 cas genes (Figure 1A) associated with new<br />
spacer uptake [17]. This has <strong>the</strong> advantage for <strong>the</strong> virus that<br />
o<strong>the</strong>r <strong>in</strong>fect<strong>in</strong>g viruses will still be attacked by crRNAs if<br />
match<strong>in</strong>g spacers are already present <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> locus, but<br />
new spacers cannot be generated from M164 provirus itself.<br />
O<strong>the</strong>r possible mechanisms were discerned from a study<br />
<strong>in</strong> which <strong>CRISPR</strong> <strong>system</strong>s of Sulfolobus were challenged<br />
directly by vectors carry<strong>in</strong>g viral genes or protospacers<br />
show<strong>in</strong>g various degrees of match<strong>in</strong>g to host <strong>CRISPR</strong> spacers<br />
which mimicked, to a degree, <strong>the</strong> cont<strong>in</strong>ual <strong>in</strong>fection of a host<br />
cell with a given virus [23]. In many viable transformants,<br />
<strong>CRISPR</strong> locus deletions, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> match<strong>in</strong>g spacer, had<br />
occurred, whereas <strong>in</strong> o<strong>the</strong>rs, whole <strong>CRISPR</strong>/Cas cassettes<br />
were lost. However, several transformants revealed no<br />
changes <strong>in</strong> ei<strong>the</strong>r <strong>CRISPR</strong>/Cas modules or vector constructs,<br />
C○The Authors Journal compilation C○2011 Biochemical Society
56 Biochemical Society Transactions (2011) Volume 39, part 1<br />
suggest<strong>in</strong>g that o<strong>the</strong>r unknown regulatory mechanisms, can<br />
<strong>in</strong>activate <strong>the</strong> <strong>immune</strong> <strong>system</strong> [23].<br />
<strong>CRISPR</strong>/Cas and Cmr module mobility<br />
Sulfolobus <strong>CRISPR</strong>/Cas and Cmr modules generally occur<br />
with<strong>in</strong> variable chromosomal regions where extensive gene<br />
shuffl<strong>in</strong>g has occurred [2,41], often attributable to high levels<br />
of transposable elements. Recomb<strong>in</strong>ation at border<strong>in</strong>g IS<br />
elements can also lead to loss of <strong>CRISPR</strong>/Cas or Cmr<br />
modules [25]. There is also strong evidence <strong>in</strong> support of<br />
<strong>the</strong> transfer of whole modules between organisms based on<br />
comparative studies of <strong>CRISPR</strong>/Cas module families and<br />
<strong>the</strong>ir locations, although <strong>the</strong> transfer mechanisms rema<strong>in</strong><br />
unclear [2]. For bacteria, evidence was provided for transfer<br />
of <strong>the</strong>se modules on large plasmids [42], but many archaeal<br />
<strong>CRISPR</strong>/Cas modules are large, up to 25 kb, and <strong>the</strong><br />
largest conjugative plasmids are only approx. 40 kb [6].<br />
Chromosomal conjugation may provide a vehicle, possibly<br />
facilitated by encaptured Sulfolobus conjugative plasmids<br />
[43,44] or presently unknown mechanisms may operate,<br />
possibly with<strong>in</strong> biofilms. F<strong>in</strong>ally, although phylogenetic<br />
analyses support <strong>the</strong> transfer of <strong>CRISPR</strong>/Cas and Cmr<br />
modules between archaea and bacteria, <strong>the</strong> basic differences<br />
<strong>in</strong> archaeal and bacterial transcriptional and translational<br />
mechanisms and <strong>in</strong> <strong>the</strong> unique cell wall, membrane structures<br />
and conjugative <strong>system</strong> of archaea provide formidable<br />
barriers to transfer between doma<strong>in</strong>s [2].<br />
Fund<strong>in</strong>g<br />
Research was supported by grants from <strong>the</strong> Danish Natural Science<br />
Research Council [grant number 272-08-0391], <strong>the</strong> Danish Research<br />
Council for Technology and Production [grant number 274-07-0116]<br />
and <strong>the</strong> Danish National Research Foundation.<br />
References<br />
1 Karg<strong>in</strong>ov, F.V. and Hannon, G.J. (2010) The <strong>CRISPR</strong> <strong>system</strong>: small<br />
RNA-guided defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7–19<br />
2 Shah, S.A. and Garrett, R.A. (2010) <strong>CRISPR</strong>/Cas and Cmr modules,<br />
mobility and evolution of an adaptive <strong>immune</strong> <strong>system</strong>. Res. Microbiol.,<br />
doi:10.1016/j.resmic.2010.09.001<br />
3 Zillig, W, Arnold, H.P., Holz, I., Prangishvili, D., Schweier, A., Stedman, K.,<br />
She, Q., Phan, H., Garrett, R. and Kristjansson, J.K. (1998) Genetic<br />
elements <strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus.<br />
Extremophiles 2, 131–140<br />
4Prangishvili,D.,Forterre,P.andGarrett,R.A.(2006)Virusesof<strong>the</strong><br />
<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat. Rev. Microbiol. 11, 837–848<br />
5 Lawrence, C.M., Menon, S., Eilers, B.J., Bothner, B., Khayat, R., Douglas, T.<br />
and Young, M.J. (2009) Structural and functional studies of archaeal<br />
viruses. J. Biol. Chem. 284, 12599–12603<br />
6 Greve, B., Jensen, S., Brügger, K., Zillig, W. and Garrett, R.A. (2004)<br />
Genomic comparison of archaeal conjugative plasmids from Sulfolobus.<br />
<strong>Archaea</strong> 1, 231–23<br />
7 Erauso, G., Stedman, K.M., van de Werken, H.J.G., Zillig, W. and van der<br />
Oost, J. (2006) Two novel conjugative plasmids from a s<strong>in</strong>gle stra<strong>in</strong> of<br />
Sulfolobus. Microbiology152, 1951–1968<br />
8 Barrangou, R., Fremaux, C., Deveau, H., Richards, M., Boyaval, P.,<br />
Mo<strong>in</strong>eau, S., Romero, D.A. and Horvath, P. (2007) <strong>CRISPR</strong> provides<br />
acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315,<br />
1709–1712<br />
C○The Authors Journal compilation C○2011 Biochemical Society<br />
9Horvath,P.,Romero,D.A.,Coûté-Monvois<strong>in</strong>, A.-C., Richards, M.,<br />
Deveau, H., Mo<strong>in</strong>eau, S., Boyaval, P., Fremaux, C. and Barrangou, R.<br />
(2008) Diversity, activity, and evolution of <strong>CRISPR</strong> loci <strong>in</strong> Streptococcus<br />
<strong>the</strong>rmophilus. J. Bacteriol. 190, 1401–1412<br />
10 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference limits<br />
horizontal gene transfer <strong>in</strong> staphylococci by target<strong>in</strong>g DNA. Science 322,<br />
1843–1845<br />
11 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2010) Self versus non-self<br />
discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463,<br />
568–571<br />
12 Brouns, S.J., Jore, M.M., Lundgren, M., Westra, E.R., Slijkhuis, R.J.,<br />
Snijders, A.P., Dickman, M.J., Makarova, K.S., Koon<strong>in</strong>, E.V. and van der<br />
Oost, J. (2008) Small <strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes.<br />
Science 321, 960–964<br />
13 Haft, D.H., Selengut, J., Mongod<strong>in</strong>, E.F. and Nelson, K.E. (2005) A guild of<br />
45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/Cas<br />
subtypes exist <strong>in</strong> prokaryotic genomes. PloS Comput. Biol. 1, 474–483<br />
14 Makarova, K.S., Grish<strong>in</strong>, N.V., Shabal<strong>in</strong>a, S.A., Wolf, Y.I. and Koon<strong>in</strong>, E.V.<br />
(2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong> <strong>in</strong><br />
prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />
mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical<br />
mechanisms of action. Biol. Direct 1, 7<br />
15 Shah, S.A., Hansen, N.R. and Garrett, R.A. (2009) Distributions of <strong>CRISPR</strong><br />
spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal<br />
acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />
Biochem. Soc. Trans. 37, 23–28<br />
16 Lillestøl, R.K., Shah, S.A., Brügger, K., Redder, P., Phan, H., Christiansen, J.<br />
and Garrett, R.A. (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />
Sulfolobus: bidirectionaltranscriptionanddynamicproperties.Mol.<br />
Microbiol. 72, 259–272<br />
17 Shah, S.A., Vestergaard, G. and Garrett, R.A. (2011) <strong>CRISPR</strong>/Cas and<br />
<strong>CRISPR</strong>/Cmr <strong>immune</strong> <strong>system</strong>s of archaea. In Regulatory RNAs <strong>in</strong><br />
Prokaryotes (Marchfelder, A. and Hess, W., eds), Spr<strong>in</strong>ger, Berl<strong>in</strong>,<br />
<strong>in</strong> <strong>the</strong> press<br />
18 Carte, J., Wang, R., Li, H., Terns, R.M. and Terns, M.P. (2008) Cas6 is an<br />
endoribonuclease that generates guide RNAs for <strong>in</strong>vader defense <strong>in</strong><br />
prokaryotes. Genes Dev. 22, 3489–3496<br />
19 Hale, C., Kleppe, K., Terns, R.M. and Terns, M.P. (2008) Prokaryotic<br />
silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus furiosus. RNA14, 1–8<br />
20 Lillestøl, R.K., Redder, P., Garrett, R.A. and Brügger, K. (2006) A putative<br />
viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong> 2, 59–72<br />
21 Wurtzel, O., Sapra, R., Chen, F., Zhu, Z.Y., Simmons, B.A. and Sorek, R.<br />
(2010) A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />
Genome Res. 20, 133–141<br />
22 Reno, M.L., Hel, N.L., Fields, C.J., Burke, P.V. and Whitaker, R.J. (2009)<br />
Biogeography of <strong>the</strong> Sulfolobus islandicus pan-genome. Proc. Natl. Acad.<br />
Sci. U.S.A. 106, 8605–8610<br />
23 Gudbergsdottir, S., Deng, L., Chen, Z., Jensen, J.V.K., Jensen, L.R., She, Q.<br />
and Garrett, R.A. (2011) Dynamic properties of <strong>the</strong> Sulfolobus<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with<br />
vector-borne viral and plasmid genes and protospacers. Mol. Microbiol.<br />
79, 35–49<br />
24 Blount, Z.D. and Grogan, D.W. (2005) New <strong>in</strong>sertion sequences of<br />
Sulfolobus: functionalpropertiesandimplicationsforgenomeevolution<br />
<strong>in</strong> hyper<strong>the</strong>rmophilic archaea. Mol. Microbiol. 55, 312–325<br />
25 Redder, P. and Garrett, R.A. (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />
genome of Sulfolobus solfataricus P2. J. Bacteriol. 188, 4198–4206<br />
26 Tang, T.-H., Bachellerie, J.-P., Rozhdestvensky, T., Bortol<strong>in</strong>, M.-L.,<br />
Huber, H., Drungowski, M., Elge, T., Brosius, J. and Hüttenhofer, A. (2002)<br />
Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong><br />
archaeon Archaeoglobus fulgidus. Proc. Natl. Acad. Sci. U.S.A. 99,<br />
7536–7541<br />
27 Tang, T.-H., Polacek, N., Zywicki, M., Huber, H., Brügger, K., Garrett, R.A.,<br />
Bachellerie, J.P. and Hüttenhofer, A. (2005) Identification of novel<br />
non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon<br />
Sulfolobus solfataricus. Mol.Microbiol.55, 469–481
28 Torar<strong>in</strong>sson, E., Klenk, H.P. and Garrett, R.A. (2005) Divergent<br />
transcriptional and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7,<br />
47–54<br />
29 Santangelo, T.J., Cubonová, L., Sk<strong>in</strong>ner, K.M. and Reeve, J.N. (2009)<br />
<strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191,<br />
7102–7108<br />
30 Peng, X., Brügger, K., Shen, B., Chen, L., She, Q. and Garrett, R.A. (2003)<br />
Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> large clusters of DNA repeats (short<br />
regularly spaced repeats) present <strong>in</strong> Sulfolobus genomes. J. Bacteriol.<br />
185, 2410–2417<br />
31 Hale, C.R., Zhao, P., Olson, S., Duff, M.O., Graveley, B.R., Wells, L., Terns,<br />
R.M. and Terns, M.P. (2009) RNA-guided RNA cleavage by a <strong>CRISPR</strong><br />
RNA–Cas prote<strong>in</strong> complex. Cell 139, 945–956<br />
32 Held, N.L. and Whitaker, R.J. (2009) Viral biogeography revealed by<br />
signatures <strong>in</strong> Sulfolobus islandicus genomes. Environ. Microbiol. 11,<br />
457–466<br />
33 Deveau, H., Barrangou, R., Garneau, J.E., Labonté, J., Fremaux, C.,<br />
Boyaval, P., Romero, D.A., Horvath, P. and Mo<strong>in</strong>eau, S. (2008) Phage<br />
response to <strong>CRISPR</strong>-encoded resistance <strong>in</strong> Streptococcus <strong>the</strong>rmophilus.<br />
J. Bacteriol. 190, 1390–1400<br />
34 Mojica, F.J., Diez-Villasenor, C., Garcia-Mart<strong>in</strong>ez, J. and Almendros, C.<br />
(2009) Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic<br />
<strong>CRISPR</strong> <strong>system</strong>. Microbiology 155, 733–740<br />
35 She, Q., Peng, X., Zillig, W. and Garrett, R.A. (2001) Gene capture events<br />
<strong>in</strong> archaeal chromosomes. Nature 409, 478<br />
36 Redder, P., Peng, X., Brügger, K., Shah, S.A., Roesch, F., Greve, B.,<br />
She, Q., Schleper, C., Forterre, P., Garrett, R.A. and Prangishvili, D. (2009)<br />
Four newly isolated fuselloviruses from extreme geo<strong>the</strong>rmal<br />
environments reveal unusual morphologies and a possible <strong>in</strong>terviral<br />
recomb<strong>in</strong>ation mechanism. Environ. Microbiol. 11, 2849–2862<br />
37 Prangishvili, D., Albers, S.V., Holz, I., Arnold, H.P., Stedman, K., Kle<strong>in</strong>, T.,<br />
S<strong>in</strong>gh, H., Hiort, J., Schweier, A., Kristjansson, J.K. and Zillig, W. (1998)<br />
Conjugation <strong>in</strong> archaea: frequent occurrence of conjugative plasmids <strong>in</strong><br />
Sulfolobus. Plasmid 40, 190–202<br />
Molecular Biology of <strong>Archaea</strong> II 57<br />
38 Peng, X., Kessler, A., Phan, H., Garrett, R.A. and Prangishvili, D. (2004)<br />
Multiple variants of <strong>the</strong> archaeal DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host<br />
and a novel mechanism of genomic variation. Mol. Microbiol. 54,<br />
366–375<br />
39 Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, M., Phan, H.,<br />
Briegel, A., Rachel, R., Garrett, R.A. and Prangishvili, D. (2008) SRV, a<br />
new rudiviral isolate from Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />
rudiviruses with <strong>the</strong> host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190,<br />
6837–6845<br />
40 Garrett, R.A., Prangishvili, D., Shah, S.A., Reuter, M., Stetter, K. and Peng,<br />
X. (2010) Metagenomic analyses of novel viruses, plasmids, and <strong>the</strong>ir<br />
variants, from an environmental sample of hyper<strong>the</strong>rmophilic<br />
neutrophiles cultured <strong>in</strong> a bioreactor. Environ. Microbiol. 12, 2918–2930<br />
41 Brügger, K., Torar<strong>in</strong>sson, E., Chen, L. and Garrett, R.A. (2004) Shuffl<strong>in</strong>g of<br />
Sulfolobus genomes by autonomous and non-autonomous mobile<br />
elements. Biochem. Soc. Trans. 32, 179–183<br />
42 Godde, J.S. and Bickerton, A. (2006) The repetitive DNA elements called<br />
<strong>CRISPR</strong>s and <strong>the</strong>ir associated genes: evidence of horizontal transfer<br />
among prokaryotes. J. Mol. Evol. 62, 718–729<br />
43 Aagaard, C., Dalgaard, J. and Garrett, R.A. (1995) Inter-cellular mobility<br />
and hom<strong>in</strong>g of an archaeal rDNA <strong>in</strong>tron confers selective advantage over<br />
<strong>in</strong>tron-cells of Sulfolobus acidocaldarius. Proc. Natl. Acad. Sci. U.S.A. 92,<br />
12285–12289<br />
44 Grogan, D.W. (1996) Exchange of genetic markers at extremely high<br />
temperatures <strong>in</strong> <strong>the</strong> archaeon Sulfolobus acidocaldarius. J. Bacteriol.<br />
178, 3207–3211<br />
45 Bailey, T.L., Williams, N., Misleh, C. and Li, W.W. (2006) MEME:<br />
discover<strong>in</strong>g and analyz<strong>in</strong>g DNA and prote<strong>in</strong> sequence motifs. Nucleic<br />
Acids Res. 34, 369–373<br />
Received 30 September 2010<br />
doi:10.1042/BST0390051<br />
C○The Authors Journal compilation C○2011 Biochemical Society
Extremophiles (2011) 15:487–497<br />
DOI 10.1007/s00792-011-0379-y<br />
ORIGINAL PAPER<br />
Genomic analysis of Acidianus hospitalis W1 a host for study<strong>in</strong>g<br />
crenarchaeal virus and plasmid life cycles<br />
Xiao-Yan You • Chao Liu • Sheng-Yue Wang • Cheng-Y<strong>in</strong>g Jiang • Shiraz A. Shah •<br />
David Prangishvili • Qunx<strong>in</strong> She • Shuang-Jiang Liu • Roger A. Garrett<br />
Received: 4 March 2011 / Accepted: 26 April 2011 / Published onl<strong>in</strong>e: 24 May 2011<br />
Ó The Author(s) 2011. This article is published with open access at Spr<strong>in</strong>gerl<strong>in</strong>k.com<br />
Abstract The Acidianus hospitalis W1 genome consists<br />
of a m<strong>in</strong>imally sized chromosome of about 2.13 Mb and a<br />
conjugative plasmid pAH1 and it is a host for <strong>the</strong> model<br />
filamentous lipothrixvirus AFV1. The chromosome carries<br />
three putative replication orig<strong>in</strong>s <strong>in</strong> conserved genomic<br />
regions and two large regions where non-essential genes are<br />
clustered. With<strong>in</strong> <strong>the</strong>se variable regions, a few orphan orfB<br />
and o<strong>the</strong>r elements of <strong>the</strong> IS200/607/605 family are concentrated<br />
with a novel class of MITE-like repeat elements.<br />
There are also 26 highly diverse vapBC antitox<strong>in</strong>–tox<strong>in</strong> gene<br />
pairs proposed to facilitate ma<strong>in</strong>tenance of local chromosomal<br />
regions and to m<strong>in</strong>imise <strong>the</strong> impact of environmental<br />
Communicated by L. Huang.<br />
X.-Y. You and C. Liu contributed equally to this work.<br />
X.-Y. You C.-Y. Jiang S.-J. Liu (&)<br />
State Key Laboratory of Microbial Resources and Center<br />
for Environmental Microbiology, Institute of Microbiology,<br />
Ch<strong>in</strong>ese Academy of Sciences,<br />
Bei-Chen-Xi-Lu No. 1 Chao-Yang District,<br />
Beij<strong>in</strong>g 100101, People’s Republic of Ch<strong>in</strong>a<br />
e-mail: liusj@sun.im.ac.cn<br />
C. Liu S. A. Shah Q. She R. A. Garrett (&)<br />
<strong>Archaea</strong> Centre, Department of Biology,<br />
Copenhagen University, Ole Maaløes Vej 5,<br />
2200 N Copenhagen, Denmark<br />
e-mail: garrett@bio.ku.dk<br />
S.-Y. Wang<br />
Shanghai-MOST Key Laboratory of Health and Disease<br />
Genomics, Ch<strong>in</strong>ese National Human Genome Center,<br />
Shanghai, People’s Republic of Ch<strong>in</strong>a<br />
D. Prangishvili<br />
Molecular Biology of <strong>the</strong> Gene <strong>in</strong> Extremophiles Unit,<br />
Institut Pasteur, rue Dr Roux 25, 75724 Paris Cedex, France<br />
stress. Complex and partially defective <strong>CRISPR</strong>/Cas/Cmr<br />
<strong>immune</strong> <strong>system</strong>s are present and <strong>in</strong>terspersed with five<br />
vapBC gene pairs. Remnants of <strong>in</strong>tegrated viral genomes<br />
and plasmids are located at five <strong>in</strong>tron-less tRNA genes and<br />
several non-cod<strong>in</strong>g RNA genes are predicted that are conserved<br />
<strong>in</strong> o<strong>the</strong>r Sulfolobus genomes. The putative metabolic<br />
pathways for sulphur metabolism show some significant<br />
differences from those proposed for o<strong>the</strong>r Acidianus and<br />
Sulfolobus species. The small and relatively stable genome<br />
of A. hospitalis W1 renders it a promis<strong>in</strong>g candidate for<br />
develop<strong>in</strong>g <strong>the</strong> first Acidianus genetic <strong>system</strong>s.<br />
Keywords Tox<strong>in</strong>–antitox<strong>in</strong> VapBC <strong>CRISPR</strong><br />
Sulphur metabolism OrfB element MITE<br />
Introduction<br />
The Acidianus genus consists of acido<strong>the</strong>rmophiles which<br />
grow optimally and slowly <strong>in</strong> <strong>the</strong> temperature range<br />
65–95°C and at pH 2–4 and belongs to <strong>the</strong> order Sulfolobales.<br />
Acidianus species are chemolithoautotrophic and<br />
facultatively anaerobic and are generally versatile physiologically.<br />
Depend<strong>in</strong>g on <strong>the</strong> cultur<strong>in</strong>g conditions, <strong>the</strong>y can<br />
ei<strong>the</strong>r reduce S° to H2S, catalysed by a sulphur reductase<br />
and hydrogenase, or oxidise S° to H2SO4 utilis<strong>in</strong>g <strong>the</strong><br />
sulphur oxygenase-reductase holoenzyme (Kletz<strong>in</strong> 1992,<br />
2007). In contrast to several Sulfolobus species, <strong>the</strong> genomic<br />
properties of an Acidianus species have not been<br />
analysed. The Sulfolobales have been a rich source of<br />
genetic elements, <strong>in</strong>clud<strong>in</strong>g novel conjugative plasmids<br />
(Prangishvili et al. 1998; Greve et al. 2004) and several<br />
exceptional and diverse viruses many of which have now<br />
been classified <strong>in</strong>to eight new viral families (Rachel et al.<br />
2002; Prangishvili et al. 2006; Lawrence et al. 2009).<br />
123
488 Extremophiles (2011) 15:487–497<br />
Acidianus hospitalis W1 is <strong>the</strong> first Acidianus stra<strong>in</strong> to be<br />
isolated carry<strong>in</strong>g a conjugative plasmid pAH1 which is a<br />
member of <strong>the</strong> plasmid family predicted to generate an<br />
archaea-specific conjugative apparatus (Greve et al. 2004;<br />
Basta et al. 2009). These plasmids are also <strong>in</strong>tegrative<br />
elements and <strong>in</strong> an encaptured state have been implicated <strong>in</strong><br />
facilitat<strong>in</strong>g chromosomal DNA conjugation for some Sulfolobus<br />
species (Chen et al. 2005b). A. hospitalis is also a<br />
viable host for <strong>the</strong> model Acidianus alpha lipothrixvirus<br />
AFV1, a filamentous virus carry<strong>in</strong>g exceptional claw-like<br />
structures at its term<strong>in</strong>i which is currently <strong>the</strong> subject of<br />
detailed structural studies (Bettstetter et al. 2003; Goulet<br />
et al. 2009). Infection of A. hospitalis with AFV1 was shown<br />
to lead to a loss of <strong>the</strong> plasmid pAH1 and this contrasts with<br />
observations <strong>in</strong> bacteria where endogenous plasmids tend to<br />
determ<strong>in</strong>e <strong>the</strong> fate of an <strong>in</strong>com<strong>in</strong>g phage (Basta et al. 2009).<br />
In order to study fur<strong>the</strong>r <strong>the</strong> metabolic capability of an<br />
Acidianus species and to exam<strong>in</strong>e <strong>the</strong> molecular mechanisms<br />
<strong>in</strong>volved <strong>in</strong> virus–plasmid–host <strong>in</strong>teractions, it was<br />
important to sequence and annotate <strong>the</strong> A. hospitalis genome.<br />
To date, most genomic studies of <strong>the</strong> Sulfolobales<br />
have concentrated on Sulfolobus species that have revealed<br />
relatively large genomes generally exhibit<strong>in</strong>g high levels of<br />
transposable and <strong>in</strong>tegrated genetic elements, as well as<br />
considerable genetic diversity (Guo et al. 2011). Analysis<br />
of <strong>the</strong> A. hospitalis genome revealed a m<strong>in</strong>imally sized<br />
chromosome that appeared relatively stable with few<br />
transposable elements and no evidence of recent <strong>in</strong>tegration<br />
events, apart from <strong>the</strong> reversible <strong>in</strong>tegration of pAH1<br />
<strong>in</strong>to a tRNA Arg gene (Basta et al. 2009). Potentially,<br />
<strong>the</strong>refore, A. hospitalis W1 could provide a suitable host<br />
for develop<strong>in</strong>g genetic <strong>system</strong>s for <strong>the</strong> Acidianus genus.<br />
Materials and methods<br />
Genome sequenc<strong>in</strong>g and gap closure<br />
Genomic DNA of A. hospitalis was sequenced us<strong>in</strong>g a<br />
Roche 454 Genome Sequencer FLX <strong>in</strong>strument (Titanium)<br />
with an average 19-fold coverage. All useful reads were<br />
<strong>in</strong>itially assembled <strong>in</strong>to seven contigs ([500 bp) us<strong>in</strong>g <strong>the</strong><br />
Newbler assembler software (http://www.454.com/). Gaps<br />
were closed by a Multiplex PCR strategy and PCR products<br />
were gel purified and sequenced us<strong>in</strong>g an ABI3730 DNA<br />
sequenator. Raw sequence data were assembled <strong>in</strong>to contigs<br />
us<strong>in</strong>g phred/phrap/consed software and <strong>the</strong> f<strong>in</strong>al consensus<br />
quality for each base was above 30 (http://www.phrap.org).<br />
Sequence analysis and gene annotation<br />
Initially, ORFs were predicted us<strong>in</strong>g <strong>the</strong> programmes<br />
Glimmer and FgeneSB and prote<strong>in</strong> function predictions<br />
123<br />
were obta<strong>in</strong>ed from <strong>the</strong> follow<strong>in</strong>g searches: (1) homology<br />
searches <strong>in</strong> <strong>the</strong> GenBank (http://www.ncbi.nlm.nih.gov/)<br />
and UniProt prote<strong>in</strong> (http://www.ebi.ac.uk/uniprot/) databases,<br />
(2) function assignment searches <strong>in</strong> <strong>the</strong> Sulfolobus<br />
database (http://www.Sulfolobus.org/), and (3) doma<strong>in</strong> or<br />
motif searches <strong>in</strong> <strong>the</strong> local CDD database (http://www.<br />
ncbi.nlm.nih.gov/cdd/), <strong>the</strong> InterPro and <strong>the</strong> Pfam databases.<br />
The KEGG database (http://www.genome.jp/kegg/)<br />
was used to reconstruct metabolic pathways <strong>in</strong> silico.<br />
Membrane prote<strong>in</strong>s were predicted by Phobius, TMHMM<br />
and ConPred II programmes. Secretory prote<strong>in</strong>s were<br />
divided <strong>in</strong>to two groups; those with a signal peptide were<br />
predicted us<strong>in</strong>g <strong>the</strong> SignalP 3.0 (http://www.cbs.dtu.dk/<br />
services/SignalP/) and non-classical secretory prote<strong>in</strong>s,<br />
lack<strong>in</strong>g a signal peptide, were predicted by <strong>the</strong> SecretomeP<br />
2.0 programme (http://www.cbS.dtu.dk/services/SecretomeP/).<br />
Transporters were predicted by search<strong>in</strong>g <strong>the</strong> TCDB database<br />
(http://www.tcdp.org) us<strong>in</strong>g BLASTP with E values<br />
lower than 1e-05. Insertion sequence (IS) elements and<br />
transposases were identified by BLASTN searches aga<strong>in</strong>st<br />
<strong>the</strong> IS F<strong>in</strong>der database (http://www-is.biotoul.fr/). The<br />
MITE-like elements were detected us<strong>in</strong>g <strong>the</strong> programme<br />
LUNA (Brügger K, unpublished). Potential frameshifts<br />
were checked by sequenc<strong>in</strong>g after manual annotation and<br />
any rema<strong>in</strong><strong>in</strong>g frameshifts were considered to be au<strong>the</strong>ntic.<br />
tRNA genes and <strong>the</strong>ir <strong>in</strong>trons were identified us<strong>in</strong>g<br />
tRNAScan-SE (Lowe and Eddy 1997). All annotations<br />
were manually curated us<strong>in</strong>g Artemis software (Ru<strong>the</strong>rford<br />
et al. 2000). Start codons for s<strong>in</strong>gle genes and first genes of<br />
Sulfolobus operons were generally located 25–30 bp<br />
downstream from <strong>the</strong> archaeal hexameric TATA-like box.<br />
Only genes with<strong>in</strong> operons were preceded by Sh<strong>in</strong>e–<br />
Dalgarno motifs, where GGUG dom<strong>in</strong>ated (Torar<strong>in</strong>sson<br />
et al. 2005). Where alternative start codons occur, a<br />
selection was made on <strong>the</strong> basis of experimental data when<br />
available or on its location relative to a putative promoter<br />
and/or Sh<strong>in</strong>e–Dalgarno motif. The genome sequence<br />
accession number at Genbank/EMBL is CP002535.<br />
Results<br />
Genomic properties<br />
The A. hospitalis genome consists of a circular chromosome<br />
of 2,137,654 bp and a circular conjugative plasmid<br />
pAH1 of 28,644 bp. The chromosome has a GC content of<br />
34.2% and carries 2,389 predicted open read<strong>in</strong>g frames<br />
(ORFs), of which about half are assigned putative functions<br />
with many of <strong>the</strong> conserved hypo<strong>the</strong>tical prote<strong>in</strong>s be<strong>in</strong>g<br />
archaea-specific or specific to <strong>the</strong> Sulfolobales. About 320<br />
of <strong>the</strong> encoded prote<strong>in</strong>s are putative membrane prote<strong>in</strong>s<br />
and a fur<strong>the</strong>r 182 are predicted to be secretory prote<strong>in</strong>s.
Extremophiles (2011) 15:487–497 489<br />
The plasmid sequence is identical to that of <strong>the</strong> conjugative<br />
plasmid pAH1 isolated earlier from <strong>the</strong> A. hospitalis stra<strong>in</strong><br />
W1, except that it is 4 bp shorter (Basta et al. 2009).<br />
Comparison of <strong>the</strong> A. hospitalis genome with those of<br />
o<strong>the</strong>r members of <strong>the</strong> Sulfolobales provided no evidence of<br />
extensive conservation of gene synteny, <strong>in</strong> contrast to that<br />
observed for large regions of several Sulfolobus genomes<br />
(Guo et al. 2011), and consistent with A. hospitalis be<strong>in</strong>g<br />
relatively distant phylogenetically from <strong>the</strong>se stra<strong>in</strong>s (Basta<br />
et al. 2009). Never<strong>the</strong>less, <strong>the</strong> genome carries two major<br />
regions that are predicted to be relatively labile. They<br />
extend approximately from positions 75,000–444,500 and<br />
from 1,300,000–1,870,000 and carry most of <strong>the</strong> transposable<br />
elements, all of <strong>the</strong> <strong>CRISPR</strong> loci and cas and cmr<br />
family genes, most of <strong>the</strong> vapBC tox<strong>in</strong>–antitox<strong>in</strong> gene<br />
pairs, and many genes <strong>in</strong>volved <strong>in</strong> transport-related functions<br />
and metabolism, as well as a degenerate fuselloviral<br />
genome (Fig. 1). These two regions lack genes essential for<br />
<strong>in</strong>formational processes <strong>in</strong>clud<strong>in</strong>g DNA replication, transcription<br />
and translation and <strong>the</strong>y appear to constitute sites<br />
where non-essential genes are collected, <strong>in</strong>terchanged,<br />
exchanged <strong>in</strong>tercellularly and where genetic <strong>in</strong>novation<br />
may occur, similarly to a s<strong>in</strong>gle variable region observed <strong>in</strong><br />
several Sulfolobus genomes (Guo et al. 2011).<br />
Three orig<strong>in</strong>s of chromosomal replication, demonstrated<br />
experimentally for Sulfolobus species (Rob<strong>in</strong>son et al. 2004;<br />
Lundgren et al. 2004), were also predicted to occur <strong>in</strong> <strong>the</strong><br />
Acidianus genome. The Y component of a Z curve analysis<br />
(Zhang and Zhang 2003) revealed two major peaks correspond<strong>in</strong>g<br />
to <strong>the</strong> cdc6-3 gene (Ahos0001), and <strong>the</strong> whiP/cdt1<br />
gene (Ahos1370) and a broader peak co<strong>in</strong>cid<strong>in</strong>g with <strong>the</strong><br />
cdc6-1 gene (Ahos0780) (Fig. 1), where <strong>the</strong> three genes<br />
encode putative replication <strong>in</strong>itiators (Rob<strong>in</strong>son and Bell<br />
2007). The sequences of <strong>the</strong> cdc6 genes and whiP gene<br />
are quite conserved relative to <strong>the</strong> S. solfataricus and<br />
y - component<br />
10K<br />
8K<br />
6K<br />
4K<br />
2K<br />
0<br />
-2K<br />
-4K<br />
2<br />
Family II <strong>CRISPR</strong>s<br />
transposable elements<br />
tox<strong>in</strong>/antitox<strong>in</strong> <strong>system</strong>s<br />
Fig. 1 The Y component of a Z curve plot for <strong>the</strong> A. hospitalis<br />
chromosome show<strong>in</strong>g <strong>the</strong> three putative replication orig<strong>in</strong>s. The<br />
positions of <strong>the</strong> cdc6-3 gene (orig<strong>in</strong> 2), cdc6-1 gene (orig<strong>in</strong> 3) and<br />
<strong>the</strong> whiP/cdt1 gene (orig<strong>in</strong> 1) are <strong>in</strong>dicated as well as locations of <strong>the</strong><br />
S. islandicus genomes, as is <strong>the</strong> synteny of <strong>the</strong> flank<strong>in</strong>g genes<br />
except for <strong>the</strong> region immediately downstream from cdc6-3.<br />
Integrated genetic elements<br />
Integration of genetic elements, generally fuselloviruses or<br />
conjugative plasmids at tRNA genes, occurs commonly for<br />
genomes of <strong>the</strong> Sulfolobales (She et al. 1998; Guo et al.<br />
2011). Most <strong>in</strong>tegration events occur via a reversible<br />
archaea-specific mechanism whereby <strong>the</strong> <strong>in</strong>tegrase gene<br />
partitions <strong>in</strong>to two sections which border <strong>the</strong> <strong>in</strong>tegrated<br />
element and <strong>the</strong> N-term<strong>in</strong>al-encod<strong>in</strong>g region carry<strong>in</strong>g <strong>the</strong><br />
<strong>in</strong>tN sequence overlaps with <strong>the</strong> tRNA gene (Muskhelishvili<br />
et al. 1993). Elements that become encaptured with<strong>in</strong> <strong>the</strong><br />
chromosome subsequently degenerate and are gradually<br />
lost, but will never<strong>the</strong>less leave a trace because <strong>the</strong> <strong>in</strong>tN<br />
fragment overlapp<strong>in</strong>g <strong>the</strong> tRNA gene is generally reta<strong>in</strong>ed<br />
(She et al. 1998) (Table 1).<br />
Earlier plasmid pAH1 was sequenced and shown to<br />
<strong>in</strong>tegrate reversibly <strong>in</strong>to a tRNA Arg gene (Basta et al.<br />
2009). Genome sequenc<strong>in</strong>g of A. hospitalis revealed that a<br />
low fraction of reads matched to <strong>the</strong> junctions of <strong>the</strong><br />
<strong>in</strong>tegrated plasmid whilst <strong>the</strong> majority matched <strong>the</strong><br />
unpartitioned <strong>in</strong>tegrase gene of pAH1, consistent with both<br />
<strong>in</strong>tegrated and free forms be<strong>in</strong>g present <strong>in</strong> <strong>the</strong> culture. The<br />
<strong>in</strong>tegration site of pAH1 was located at genome positions<br />
1,075,876–1,075,946 bp with<strong>in</strong> <strong>the</strong> gene of tRNA Arg<br />
[TCG] (Table 1). In addition, <strong>the</strong> chromosome carries<br />
remnants of <strong>in</strong>tegrated elements adjo<strong>in</strong><strong>in</strong>g ano<strong>the</strong>r five<br />
<strong>in</strong>tron-less tRNA genes, each consist<strong>in</strong>g of a few genes or<br />
pseudogenes (Table 1). Three derive from fuselloviruses,<br />
one from a pDL10-like plasmid of <strong>the</strong> pRN family of<br />
cryptic plasmids (Kletz<strong>in</strong> et al. 1999) and ano<strong>the</strong>r orig<strong>in</strong>ates<br />
from an unknown element (Table 1). Whe<strong>the</strong>r <strong>the</strong>se<br />
all derive from s<strong>in</strong>gle <strong>in</strong>tegration events rema<strong>in</strong>s unclear<br />
0.5M 5S 3 16S 23S 1M 1<br />
1.5M 2M<br />
genome length<br />
Family I <strong>CRISPR</strong>s<br />
ribosomal RNA genes, <strong>the</strong> <strong>CRISPR</strong>-based <strong>system</strong>s, transposable<br />
elements of <strong>the</strong> IS200/605/607 family, and vapBC antitox<strong>in</strong>–tox<strong>in</strong><br />
gene pairs<br />
123
490 Extremophiles (2011) 15:487–497<br />
Table 1 Integration events at tRNA genes show<strong>in</strong>g <strong>the</strong> numbers of<br />
residual <strong>in</strong>tegrated genes<br />
tRNA Intron Ahos W1<br />
Arg–TCG No pAH1<br />
Pro–TGG No <strong>in</strong>tN fragment<br />
Glu–CTC No 0986a–0988<br />
because, <strong>in</strong> pr<strong>in</strong>ciple, successive <strong>in</strong>tegrations can occur at a<br />
given tRNA gene (Redder et al. 2009). An additional 8–10<br />
genes and pseudogenes, most of which are fusellovirusrelated,<br />
are clustered distantly from a tRNA gene and <strong>the</strong>y<br />
may have become displaced from one of <strong>the</strong> three tRNA<strong>in</strong>tegrated<br />
elements.<br />
Transposable elements<br />
The A. hospitalis genome carries five IS elements<br />
belong<strong>in</strong>g to <strong>the</strong> IS200/607 family, only three of which<br />
carry <strong>in</strong>tact transposase genes, and <strong>the</strong>re are 11 copies of<br />
orphan orfB elements of <strong>the</strong> IS605 family, 10 of which<br />
carry <strong>in</strong>tact orfB genes. None of <strong>the</strong>se elements carry<br />
<strong>in</strong>verted term<strong>in</strong>al repeats and <strong>the</strong>y all appear to be transposed<br />
by ‘‘cut-and-paste’’ mechanisms, with <strong>the</strong> orfB elements,<br />
at least, transpos<strong>in</strong>g via circular s<strong>in</strong>gle stranded<br />
<strong>in</strong>termediates and <strong>in</strong>sert<strong>in</strong>g after TTAC sequences (Filée<br />
et al. 2007; Ton-Hoang et al. 2010).<br />
Sulfolobus genomes generally carry IS elements from a<br />
wide variety of families most of which carry <strong>in</strong>verted term<strong>in</strong>al<br />
repeats and are mobilised by ‘‘copy-and-paste’’<br />
mechanisms, and tend to be lost by gradual degeneration<br />
and not by deletion (Blount and Grogan 2005; Redder and<br />
Garrett 2006). None of <strong>the</strong>se IS element classes were<br />
detected <strong>in</strong> <strong>the</strong> A. hospitalis genome and this suggests that<br />
<strong>the</strong> genome has rarely, if ever, taken up any of <strong>the</strong>se IS<br />
element classes.<br />
A new class of MITE-like elements<br />
fusellovirus<br />
Arg–TCT No 1232–1238<br />
unknown element<br />
Cys–GCA No 1550–1558<br />
plasmid pDL10<br />
Leu–GAG No 2147–2151<br />
fusellovirus ASV1<br />
1604–1609 kb (no tRNA) – 1778–1786<br />
fusellovirus SSV<br />
ASV1 Acidianus sp<strong>in</strong>dle-shaped virus, SSV Sulfolobus sp<strong>in</strong>dle-shaped<br />
virus<br />
Although none of <strong>the</strong> MITE elements that are common to<br />
o<strong>the</strong>r Sulfolobus genomes were detected (Redder et al.<br />
123<br />
2001; Guo et al. 2011), <strong>the</strong> A. hospitalis genome carries 10<br />
copies of a repeat sequence resembl<strong>in</strong>g a MITE-like element<br />
(Fig. 3). At one end, it carries a short open read<strong>in</strong>g<br />
frame correspond<strong>in</strong>g <strong>in</strong> am<strong>in</strong>o acid sequence to <strong>the</strong><br />
downstream end of an OrfB prote<strong>in</strong> (Fig. 3). The conserved<br />
term<strong>in</strong>al sequence and <strong>the</strong> <strong>in</strong>ternal similarity to <strong>the</strong> orfB<br />
element suggests that it could be a transposable element.<br />
This supposition is re<strong>in</strong>forced by <strong>the</strong> presence of 10 full<br />
copies <strong>in</strong> <strong>the</strong> genome (and a few degenerate copies), and<br />
also by <strong>the</strong> presence of multiple copies <strong>in</strong> some Sulfolobus<br />
and o<strong>the</strong>r crenarchaeal genomes (unpublished data).<br />
Non-cod<strong>in</strong>g RNAs<br />
Many untranslated RNAs have been characterised experimentally<br />
for different Sulfolobus species us<strong>in</strong>g a variety of<br />
techniques <strong>in</strong>clud<strong>in</strong>g prob<strong>in</strong>g cellular RNA extracts for<br />
K-turn-b<strong>in</strong>d<strong>in</strong>g motifs and generat<strong>in</strong>g cDNA libraries of<br />
total cellular RNA extracts, as well as numerous antisense<br />
RNAs (Tang et al. 2005; Omer et al. 2006; Wurtzel et al.<br />
2010). Most of <strong>the</strong>se RNAs were characterised for partial<br />
sequence and nucleotide length, and several were detected<br />
by more than one experimental approach. Based on <strong>the</strong><br />
genome sequence comparisons and gene contexts, 23<br />
putative conserved non-cod<strong>in</strong>g RNAs were annotated <strong>in</strong> <strong>the</strong><br />
A. hospitalis genome. Genes for 12 C/D box RNAs were<br />
localised of which 7 were predicted to modify rRNAs, 2 to<br />
target tRNAs and a fur<strong>the</strong>r 2 to modify unknown RNAs. In<br />
addition, a s<strong>in</strong>gle copy of a gene for an H/ACA box RNA<br />
was located which toge<strong>the</strong>r with aPus7 should generate<br />
pseudourid<strong>in</strong>e-35 <strong>in</strong> Sulfolobus pre-tRNA Tyr<br />
transcripts<br />
(Muller et al. 2009). However, <strong>in</strong> A. hospitalis, <strong>the</strong> aPus7<br />
gene (Ahos0631) is degenerate. A fur<strong>the</strong>r 10 genes were<br />
assigned to encode RNAs of unknown function. The relatively<br />
high conservation of sequence and gene synteny for<br />
<strong>the</strong>se RNAs between Sulfolobus and Acidianus species<br />
underl<strong>in</strong>es <strong>the</strong>ir potential functional importance.<br />
Read<strong>in</strong>g frame shifts and mRNA <strong>in</strong>tron splic<strong>in</strong>g<br />
Examples of translational read<strong>in</strong>g frame shifts yield<strong>in</strong>g<br />
s<strong>in</strong>gle polypeptides have been demonstrated experimentally<br />
for S. solfataricus P2 (Cobucci-Ponzano et al. 2010).<br />
For two of <strong>the</strong>se, a transketolase (Ahos1219/1218) and a<br />
putative O-sialoglycoprote<strong>in</strong> endopeptidase (Ahos0695/<br />
0696), <strong>the</strong> A. hospitalis genes overlap <strong>in</strong> a similar way, and<br />
are likely to undergo translational frame shifts. Moreover,<br />
transcripts of <strong>the</strong> <strong>in</strong>tron-carry<strong>in</strong>g cbf5 gene (Ahos0734/<br />
0735) are likely to undergo splic<strong>in</strong>g at <strong>the</strong> mRNA level by<br />
<strong>the</strong> archaeal splic<strong>in</strong>g enzyme complex (Ahos0689/0798/<br />
1417) as has been demonstrated experimentally for different<br />
crenarchaea (Yokobori et al. 2009).
Extremophiles (2011) 15:487–497 491<br />
Metabolic pathways<br />
Genome analyses <strong>in</strong>dicate <strong>the</strong> presence of versatile metabolic<br />
pathways <strong>in</strong> A. hospitalis. They suggest that it can<br />
grow autotrophically by fix<strong>in</strong>g CO2 or heterotrophically<br />
us<strong>in</strong>g yeast extract, as has been demonstrated experimentally<br />
(Basta et al. 2009). Genome analyses also revealed<br />
genes encod<strong>in</strong>g sugar transporters and glycosidases suggest<strong>in</strong>g<br />
that A. hospitalis can assimilate carbohydrates,<br />
such as starch, glucose, mannose and galactose. Moreover,<br />
enzymes are encoded that are implicated <strong>in</strong> energy generation<br />
from oxidis<strong>in</strong>g elemental sulphur, hydrogen sulphides<br />
and o<strong>the</strong>r reduced <strong>in</strong>organic sulphide compounds, but not<br />
ferrous ions. However, no hydrogenase genes were detected<br />
suggest<strong>in</strong>g that A. hospitalis cannot use H2 as electron<br />
donor for growth.<br />
Enzymes were identified for a complete TCA cycle that<br />
is important for generat<strong>in</strong>g different <strong>in</strong>termediates for <strong>the</strong><br />
biosyn<strong>the</strong>sis of many cellular components, as well as produc<strong>in</strong>g<br />
reduced electron carriers, such as NAD(P)H,<br />
reduced ferredox<strong>in</strong> (FdR) and FADH2. Formation of acetyl-<br />
CoA from pyruvate and <strong>the</strong> formation of succ<strong>in</strong>yl-<br />
CoA from 2-oxoglutarate were predicted to be catalysed,<br />
respectively, by pyruvate ferredox<strong>in</strong> oxidoreductase (Ahos<br />
1949-1952) and 2-oxoglutarate ferredox<strong>in</strong> oxidoreductase<br />
(Ahos0089/0090/0300/0301). Moreover, both enzymes<br />
were predicted to use ferredox<strong>in</strong> <strong>in</strong>stead of NAD ? as a<br />
cofactor.<br />
Genes encod<strong>in</strong>g enzymes <strong>in</strong>volved <strong>in</strong> pathways for fix<strong>in</strong>g<br />
atmosphere N2, or reduc<strong>in</strong>g nitrate and nitrite, as<br />
nitrogen sources were absent, as observed for o<strong>the</strong>r Acidianus<br />
species, and <strong>the</strong> genome analyses suggest that<br />
ammonium is an exclusive source of nitrogen that is<br />
Fig. 2 Model of pathways for<br />
oxidation and reduction of<br />
sulphur <strong>in</strong> A. hospitalis<br />
<strong>in</strong>dicat<strong>in</strong>g <strong>the</strong> predicted<br />
functions of genes <strong>in</strong> <strong>the</strong><br />
A. hospitalis genome and<br />
correspond<strong>in</strong>g gene numbers are<br />
given for each step. The<br />
follow<strong>in</strong>g abbreviations are<br />
used: OM outer membrane,<br />
IM <strong>in</strong>ner membrane,<br />
SQR sulphide:qu<strong>in</strong>one<br />
oxidoreductase,<br />
Fcc flavocytochrome c sulphide<br />
dehydrogenase, SOR sulphur<br />
oxygenase-reductase,<br />
TetH tetrathionate hydrolase,<br />
TQO thiosulphate–qu<strong>in</strong>one<br />
oxidoreductase; SulP sulphate<br />
transporter permease,<br />
QH 2 qu<strong>in</strong>ol pool<br />
assimilated via formation of carbamoyl phosphate, glutam<strong>in</strong>e<br />
and glutamate. Genes encod<strong>in</strong>g putative carbamoyl<br />
phosphate syn<strong>the</strong>tase (Ahos1106/1107), glutam<strong>in</strong>e syn<strong>the</strong>tase<br />
(Ahos0460, Ahos1272, Ahos2233) and glutamate<br />
dehydrogenase (Ahos0494) are present.<br />
Sulphur metabolism<br />
A. hospitalis encodes several enzymes <strong>in</strong>volved <strong>in</strong> sulphur<br />
metabolism, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> oxidation and reduction of sulphur,<br />
<strong>the</strong> thiosulphate–tetrathionate cycle which generates<br />
sulphate, and <strong>the</strong> participation of sulphur <strong>in</strong> electron<br />
transport. However, genes for some sulphur metabolism<br />
enzymes, <strong>in</strong>clud<strong>in</strong>g sulphite-acceptor oxidoreductase,<br />
adenos<strong>in</strong>e phosphosulphate reductase, sulphate adenylyl<br />
transferase and adenylylsulphate phosphate adenyltransferase<br />
were not found which suggested that A. hospitalis<br />
has some pathways differ<strong>in</strong>g from those of o<strong>the</strong>r Acidianus<br />
and Sulfolobus species (Kletz<strong>in</strong> 2007). Therefore, based on<br />
<strong>the</strong> gene annotations, a model is presented for <strong>the</strong> proposed<br />
sulphur oxidation and reduction pathways <strong>in</strong> A. hospitalis<br />
(Fig. 2). Extracellular H2S is oxidised by a secretory-type<br />
sulphide:qu<strong>in</strong>one oxidoreductase (Ahos0513) and flavocytochrome<br />
c sulphide dehydrogenase (Ahos0188) to produce<br />
a surface layer of sulphur on <strong>the</strong> outer cell membrane.<br />
Elemental sulphur is <strong>the</strong>n transported <strong>in</strong>to <strong>the</strong> cell by<br />
putative-SH radical transporter(s) us<strong>in</strong>g an unknown<br />
mechanism. Subsequently, sulphur is oxidised by sulphur<br />
oxygenase-reductase (Ahos0131) to yield sulphite, thiosulphate<br />
and hydrogen sulphide. Sulphite and elemental<br />
sulphur convert spontaneously and non-enzymatically to<br />
thiosulphate and elemental sulphur and, consistent with this<br />
mechanism, no candidate gene encod<strong>in</strong>g sulphite:acceptor<br />
123
492 Extremophiles (2011) 15:487–497<br />
oxidoreductase was identified <strong>in</strong> <strong>the</strong> A. hospitalis genome.<br />
Thiosulphate enters <strong>the</strong> putative thiosulphate/tetrathionate<br />
cycle and is f<strong>in</strong>ally oxidised to sulphate. The enzymes<br />
<strong>in</strong>volved <strong>in</strong> this cycle were all annotated: thiosulphate:<br />
qu<strong>in</strong>one oxidoreductase (Ahos0112-0113 and Ahos0238-<br />
0239) and tetrathionate hydrolase (Ahos1670). H2S is<br />
ei<strong>the</strong>r oxidised by <strong>the</strong> sulphide:qu<strong>in</strong>one oxidoreductase<br />
(Ahos1014) <strong>in</strong> <strong>the</strong> cytoplasm with qu<strong>in</strong>one-cytochrome as<br />
electron acceptor or it reacts with tetrathionate spontaneously<br />
under <strong>the</strong> high temperature growth conditions.<br />
F<strong>in</strong>ally, sulphate generated from sulphur oxidation is<br />
effluxed from <strong>the</strong> cell by a putative sulphate transport<br />
permease (Ahos1256). Electrons generated from sulphur<br />
oxidation enter <strong>the</strong> electron transport cha<strong>in</strong> via qu<strong>in</strong>one.<br />
Term<strong>in</strong>al qu<strong>in</strong>ol oxidase receives electrons from qu<strong>in</strong>one<br />
and transfers <strong>the</strong>m to O2 coupled with ATP generation.<br />
Some electrons may be transmitted to <strong>the</strong> NADH complex<br />
to produce NADH for use <strong>in</strong> o<strong>the</strong>r pathways.<br />
Transporters and proteolytic enzymes<br />
Twenty-eight gene products were predicted to be <strong>in</strong>volved <strong>in</strong><br />
<strong>the</strong> transport of am<strong>in</strong>o acids, oligopeptide/dipeptides and<br />
ammonium. Of <strong>the</strong>se, 19 are implicated <strong>in</strong> am<strong>in</strong>o acid<br />
transport, <strong>in</strong>clud<strong>in</strong>g 5 am<strong>in</strong>o acid transporters (Ahos0100/<br />
0163/0197/0986/1721), three am<strong>in</strong>o acid permeases (Ahos<br />
0328/0439/1725) and 11 am<strong>in</strong>o acid permease-like prote<strong>in</strong>s<br />
(Ahos0272/0276/0958/1040/1086/1868/1891/1907/1953/<br />
2065/2251) of unknown specificity for am<strong>in</strong>o acid uptake.<br />
Genes encod<strong>in</strong>g an ammonium transporter (Ahos1467) and<br />
two oligopeptide/dipeptide ABC transporter gene clusters<br />
(Ahos0337-0342 and Ahos0170-0175) are present. In<br />
addition, 21 genes were predicted to encode proteolytic<br />
enzymes, <strong>in</strong>clud<strong>in</strong>g 20 peptidases. Of <strong>the</strong>se, four are<br />
endopeptidases (Ahos0428/0516/0695-6/0800), three are<br />
am<strong>in</strong>opeptidases (Ahos0013/0588/1941), two are peps<strong>in</strong>s<br />
(Ahos1929/2087) and one is a carboxypeptidase (Ahos<br />
0991). Five of <strong>the</strong> proteolytic enzymes are predicted to be<br />
membrane-bound and are designated secretory prote<strong>in</strong>s.<br />
These results suggest that A. hospitalis, like Acidianus<br />
brierleyi (Segerer et al. 1986), Acidianus tengchongensis<br />
(He and Li 2004) and Acidianus manzaensis (Yoshida et al.<br />
2006), can grow on organic compounds, such as yeast<br />
extract, peptone, tryptone and casam<strong>in</strong>o acids.<br />
Tox<strong>in</strong>–antitox<strong>in</strong> <strong>system</strong>s<br />
VapBC complexes constitute <strong>the</strong> ma<strong>in</strong> family of antitox<strong>in</strong>–<br />
tox<strong>in</strong>s that are encoded by members of <strong>the</strong> Sulfolobales<br />
(Pandey and Gerdes 2005; Guo et al. 2011), and <strong>the</strong>y occur<br />
ma<strong>in</strong>ly <strong>in</strong> variable genomic regions where <strong>the</strong>y may<br />
undergo loss or ga<strong>in</strong> events (Guo et al. 2011). The A.<br />
hospitalis genome carries 26 vapBC gene pairs that are<br />
123<br />
concentrated <strong>in</strong> <strong>the</strong> genomic regions 350–410 and<br />
1,374–1,912 kb with a s<strong>in</strong>gle vapC-like gene ly<strong>in</strong>g <strong>in</strong> an<br />
operon (Fig. 1). The VapB antitox<strong>in</strong>s, <strong>in</strong> contrast to VapC<br />
tox<strong>in</strong>s, could be classified <strong>in</strong>to three families of transcriptional<br />
regulators, AbrB, CcdA/CopG and DUF217 (Fig. 4a), whilst<br />
no subclassification was observed for <strong>the</strong> VapC prote<strong>in</strong>s<br />
(Fig. 4b). Tree build<strong>in</strong>g based on <strong>the</strong> sequence alignments<br />
demonstrated that <strong>the</strong> sequences of <strong>the</strong>se antitox<strong>in</strong>s and<br />
tox<strong>in</strong>s are highly diverse, with sequence identities between<br />
<strong>the</strong>m rarely exceed<strong>in</strong>g 30%, as <strong>in</strong>dicated by all <strong>the</strong> prote<strong>in</strong>s<br />
exhibit<strong>in</strong>g long branches (Fig. 4). This result contrasted<br />
with <strong>the</strong> f<strong>in</strong>d<strong>in</strong>g that VapBC complexes with closely similar<br />
sequences are commonly found when compar<strong>in</strong>g different<br />
genomes of <strong>the</strong> Sulfolobales. For example, 11 of <strong>the</strong> 26<br />
VapBC prote<strong>in</strong> pairs have closely similar homologs encoded<br />
<strong>in</strong> at least 7 of <strong>the</strong> 13 available Sulfolobus genomes<br />
(Fig. 4b). This <strong>in</strong>dicates that <strong>the</strong>re is likely to be a selection<br />
aga<strong>in</strong>st <strong>the</strong> uptake of closely similar vapBC gene pairs <strong>in</strong> a<br />
given genome, despite <strong>the</strong> abundance of such gene pairs <strong>in</strong><br />
<strong>the</strong> environment.<br />
The A. hospitalis genome also encodes six copies of<br />
RelE-related tox<strong>in</strong> prote<strong>in</strong>s, <strong>in</strong> common with o<strong>the</strong>r Sulfolobus<br />
genomes (Pandey and Gerdes 2005, unpublished<br />
results). At least three of <strong>the</strong> relE genes occur <strong>in</strong> <strong>in</strong>tegrated<br />
regions carry<strong>in</strong>g degenerated conjugative plasmids, and<br />
<strong>the</strong>y show sequence similarity to prote<strong>in</strong>s encoded <strong>in</strong><br />
Sulfolobus conjugative plasmids pKEF9 (ORF69b), pING1<br />
(ORF98) and pL085 (gene no. 3195) (Greve et al. 2004;<br />
Stedman et al. 2000; Reno et al. 2009). However, none of<br />
<strong>the</strong> putative tox<strong>in</strong> genes are l<strong>in</strong>ked physically to antitox<strong>in</strong><br />
relB genes and <strong>the</strong>ir function rema<strong>in</strong>s unknown.<br />
Diverse <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s<br />
The <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of A. hospitalis can<br />
be classified <strong>in</strong>to two ma<strong>in</strong> types based on analyses of <strong>the</strong>ir<br />
Cas1 prote<strong>in</strong>, leader and repeat sequences (Shah et al.<br />
2009; Lillestøl et al. 2009). In total, <strong>the</strong>re are six <strong>CRISPR</strong><br />
loci, carry<strong>in</strong>g 129 spacer-repeat units none of which are<br />
identical (Fig. 5). The first three loci <strong>in</strong> <strong>the</strong> genome (Ahos-<br />
53, -13 and -9a) are physically l<strong>in</strong>ked by cassettes of cmr<br />
and cas family genes, each of which conta<strong>in</strong>s a vapBC<br />
antitox<strong>in</strong>–tox<strong>in</strong> gene pair, and <strong>the</strong>y constitute a family II<br />
<strong>CRISPR</strong>/Cas <strong>system</strong> (Fig. 5a). The last two <strong>CRISPR</strong> loci<br />
(Ahos-9b and 5) are coupled <strong>in</strong>to a typical family I paired<br />
<strong>CRISPR</strong>/Cas module (Fig. 5b) and <strong>the</strong>re is a vapBC gene<br />
pair immediately upstream. Preced<strong>in</strong>g <strong>the</strong> latter <strong>CRISPR</strong>/<br />
Cas module, <strong>the</strong>re is a s<strong>in</strong>gle unclassified locus (Ahos-40)<br />
that lacks both cas genes and a leader region (Fig. 5c)<br />
(Shah and Garrett 2011).<br />
We analysed <strong>the</strong> degree to which <strong>CRISPR</strong> spacers<br />
exhibited sequence matches to <strong>the</strong> many diverse genetic<br />
elements available from Acidianus and Sulfolobus species
Extremophiles (2011) 15:487–497 493<br />
us<strong>in</strong>g an earlier approach exam<strong>in</strong><strong>in</strong>g nucleotide and translated<br />
sequences of <strong>the</strong> spacers (Shah et al. 2009; Lillestøl<br />
et al. 2009). Relatively few significant sequence matches<br />
were found and most of <strong>the</strong>se were to conjugative plasmids,<br />
with a few matches to members of five different viral<br />
families (Fig. 5).<br />
Discussion<br />
At about 2.1 Mbp, <strong>the</strong> genome of A. hospitalis is much<br />
smaller than o<strong>the</strong>r sequenced genomes of members of <strong>the</strong><br />
Sulfolobales. Although this partly reflects <strong>the</strong> presence of<br />
low levels of transposable elements and few genes deriv<strong>in</strong>g<br />
from <strong>in</strong>tegrated elements, it also results from a lower<br />
diversity of metabolic and transporter genes (Guo et al.<br />
2011). The Z curve analysis suggests that <strong>the</strong> chromosome<br />
carries three replication orig<strong>in</strong>s as for Sulfolobus species<br />
(Fig. 1), although <strong>in</strong> contrast to <strong>the</strong> sequenced stra<strong>in</strong>s of S.<br />
solfataricus and S. islandicus, <strong>the</strong> whiP/cdt1 and cdc6-2<br />
genes are widely separated.<br />
Although no <strong>system</strong>atic analysis has been performed<br />
experimentally on <strong>the</strong> metabolic capacity of A. hospitalis,<br />
genome analyses revealed that A. hospitalis possesses <strong>the</strong><br />
capacity to assimilate a broad range of organic compounds,<br />
<strong>in</strong>clud<strong>in</strong>g different am<strong>in</strong>o acids and proteolytic products,<br />
which is similar to some o<strong>the</strong>r Acidianus and Sulfolobus<br />
species (Segerer et al. 1986; Grogan 1989; He et al. 2004;<br />
Yoshida et al. 2006; Plumb et al. 2007). The analyses also<br />
support that A. hospitalis can assimilate various carbohydrates,<br />
similarly to several Sulfolobus species (Grogan<br />
1989) but <strong>in</strong> contrast to some Acidianus species (Yoshida<br />
et al. 2006; Plumb et al. 2007).<br />
A. hospitalis, like o<strong>the</strong>r Acidianus and Sulfolobus species,<br />
obta<strong>in</strong>s energy for growth ma<strong>in</strong>ly via oxidation of<br />
reduced <strong>in</strong>organic sulphuric components (RISCs), and <strong>the</strong><br />
enzymes <strong>in</strong>volved were predicted from <strong>the</strong> genome analyses<br />
(Fig. 2). A sulphur oxygenase-reductase was identified<br />
show<strong>in</strong>g am<strong>in</strong>o acid sequence similarity to o<strong>the</strong>r<br />
Acidianus and Sulfolobus SORs of 67–99%, and we<br />
<strong>in</strong>ferred that it is important for elemental sulphur oxidation<br />
and reduction, as occurs <strong>in</strong> both Acidianus and Sulfolobus<br />
species (Kletz<strong>in</strong> 1989, 1992; Sun et al. 2003; Chen et al.<br />
2005a). One product of sulphur oxygenase-reductase<br />
catalysis is sulphite. Ow<strong>in</strong>g to <strong>the</strong> apparent lack of <strong>the</strong> four<br />
enzymes, sulphite-acceptor oxidoreductase, adenos<strong>in</strong>e<br />
phosphosulphate reductase, sulphate adenylyl transferase<br />
and adenylylsulphate phosphate adenyltransferase, A.<br />
hospitalis must have adopted a strategy for sulphite oxidation<br />
that differs from <strong>the</strong> currently known pathway<br />
(Kletz<strong>in</strong> 2007). Here, we propose that sulphite is channelled<br />
to thiosulphate <strong>in</strong> A. hospitalis via a spontaneous<br />
reaction with elemental sulphur, but this rema<strong>in</strong>s to be<br />
tested experimentally. Some Acidianus species, such as<br />
A. manzaensis (Yoshida et al. 2006) and A. sulfidivorans<br />
(Plumb et al. 2007) grow chemolithoautotrophically with<br />
oxidation of molecular hydrogen, but this cannot occur <strong>in</strong><br />
A. hospitalis because it apparently lacks an encoded<br />
hydrogen dehydrogenase.<br />
Transposable elements <strong>in</strong>clude a few IS200/607 elements<br />
and several orphan orfB elements which all belong to <strong>the</strong><br />
IS200/605/607 family. They lack <strong>in</strong>verted term<strong>in</strong>al repeats<br />
and are mobilised by ‘‘cut-and-paste’’ mechanisms (Filée<br />
et al. 2007; Ton-Hoang et al. 2010). No representatives of<br />
o<strong>the</strong>r transposable element families were found, common to<br />
o<strong>the</strong>r Sulfolobus genomes, which carry <strong>in</strong>verted term<strong>in</strong>al<br />
repeats and are mobilised by ‘‘copy-and-paste’’ mechanisms<br />
(Blount and Grogan 2005; Redder and Garrett 2006). It<br />
rema<strong>in</strong>s uncerta<strong>in</strong> whe<strong>the</strong>r <strong>the</strong> OrfB prote<strong>in</strong> is responsible<br />
for transposition of <strong>the</strong> orfB elements or whe<strong>the</strong>r <strong>the</strong>y are<br />
mobilised <strong>in</strong> trans by <strong>the</strong> TnpA transposase encoded by <strong>the</strong><br />
IS200/607 elements (Filée et al. 2007; Guo et al. 2011). The<br />
IS200/607 and orfB elements have been detected <strong>in</strong> Sulfolobus<br />
conjugative plasmids and orfB elements also occur <strong>in</strong> a<br />
few viruses of <strong>the</strong> Sulfolobales <strong>in</strong>clud<strong>in</strong>g four copies <strong>in</strong> <strong>the</strong><br />
Acidianus two-tailed bicaudavirus ATV (She et al. 1998;<br />
Greve et al. 2004; Prangishvili et al. 2006). Thus, <strong>the</strong>y are<br />
likely to be transmitted <strong>in</strong>tercellularly, and enter chromosomes,<br />
via such genetic elements.<br />
MITEs are common <strong>in</strong> Sulfolobus species and have been<br />
predicted to be mobilised by transposases encoded <strong>in</strong> different<br />
IS element families (Redder et al. 2001). The novel<br />
MITE-like elements <strong>in</strong> <strong>the</strong> A. hospitalis genome (Fig. 3)<br />
may derive from orfB elements and be mobilised by a<br />
similar mechanism but at present we can provide no evidence<br />
for <strong>the</strong>ir mobility. In this respect, <strong>the</strong>y may be<br />
similar to o<strong>the</strong>r Sulfolobus MITEs which show a low level<br />
of transpositional activity (Redder and Garrett 2006). This<br />
is consistent with <strong>the</strong> hypo<strong>the</strong>sis that MITEs drive <strong>the</strong><br />
evolutionary diversification of <strong>the</strong>ir mobilis<strong>in</strong>g transposases<br />
to <strong>the</strong> po<strong>in</strong>t that <strong>the</strong>y are no longer recognised which<br />
leads to <strong>the</strong>ir immobilisation and subsequent degeneration<br />
(Feschotte and Pritham 2007).<br />
All of <strong>the</strong> <strong>in</strong>tegrated elements, except one, could be<br />
identified as orig<strong>in</strong>at<strong>in</strong>g from fuselloviruses or a pDL10like<br />
member of <strong>the</strong> pRN family of cryptic plasmids<br />
(Kletz<strong>in</strong> et al. 1999), and <strong>the</strong> conjugative plasmid pAH1<br />
was already shown to reversibly <strong>in</strong>tegrate at a tRNA Arg<br />
[TCG] gene (Basta et al. 2009). None of <strong>the</strong>se events<br />
occurred with<strong>in</strong> any of <strong>the</strong> 15 tRNA genes carry<strong>in</strong>g <strong>in</strong>trons<br />
and this observation is consistent with <strong>the</strong> hypo<strong>the</strong>sis that<br />
archaeal <strong>in</strong>trons protect tRNA genes aga<strong>in</strong>st <strong>in</strong>tegration<br />
events (Guo et al. 2011).<br />
VapBC constitutes <strong>the</strong> predom<strong>in</strong>ant antitox<strong>in</strong>–tox<strong>in</strong><br />
family found amongst <strong>the</strong> Sulfolobales and <strong>the</strong> A. hospitalis<br />
genome carries 26 vapBC gene pairs, more than occur<br />
123
494 Extremophiles (2011) 15:487–497<br />
Fig. 3 Alignment of 10 MITE-like repeat elements present <strong>in</strong> <strong>the</strong> genome of A. hospitalis. The shaded area denotes to a small open read<strong>in</strong>g<br />
frame correspond<strong>in</strong>g to <strong>the</strong> downstream part of <strong>the</strong> OrfB found with<strong>in</strong> transposable orfB elements<br />
A antitox<strong>in</strong>s [VapB] B tox<strong>in</strong>s [VapC]<br />
5%<br />
ORF<br />
0374<br />
2101<br />
0399<br />
0394<br />
0264<br />
0209<br />
0412<br />
1712<br />
1520<br />
1610<br />
0183*<br />
0356<br />
class<br />
0361<br />
1738<br />
1728<br />
0354<br />
1673<br />
1997<br />
1644 CcdA/CopG<br />
1587<br />
2059<br />
0206<br />
1524<br />
1582<br />
1978 DUF217<br />
Fig. 4 VapBC trees. Phylogenetic trees for a VapB antitox<strong>in</strong>s and<br />
b VapC tox<strong>in</strong>s. They demonstrate that VapBs, despite <strong>the</strong>ir high<br />
sequence diversity, can be classified <strong>in</strong>to three ma<strong>in</strong> families AbrB,<br />
CcdA/CopG and DUF217, whereas <strong>the</strong> VapCs are highly diverse <strong>in</strong><br />
<strong>the</strong>ir sequences but cannot be classified <strong>in</strong>to major subgroups. The<br />
Ahos gene numbers are given for each prote<strong>in</strong>. Moreover, <strong>the</strong> class of<br />
<strong>the</strong> VapB correspond<strong>in</strong>g to each VapC is given <strong>in</strong> b. The degree of<br />
conservation of <strong>the</strong> VapC prote<strong>in</strong>s <strong>in</strong> <strong>the</strong> available 13 Sulfolobus<br />
<strong>in</strong> more rapidly grow<strong>in</strong>g Sulfolobus species (Pandey and<br />
Gerdes 2005; Guo et al. 2011). Moreover, <strong>the</strong> groups of<br />
VapB and VapC prote<strong>in</strong>s are highly diverse <strong>in</strong> sequence<br />
(Fig. 4). Antitox<strong>in</strong>–tox<strong>in</strong>s were orig<strong>in</strong>ally shown to<br />
enhance plasmid ma<strong>in</strong>tenance as a consequence of <strong>the</strong><br />
growth of plasmid-free cells be<strong>in</strong>g preferentially <strong>in</strong>hibited<br />
by free tox<strong>in</strong>s which are <strong>in</strong>herently more stable than <strong>the</strong><br />
antitox<strong>in</strong>s (Gerdes 2000). By analogy with this mechanism,<br />
123<br />
AbrB<br />
5%<br />
ORF<br />
0712<br />
1521<br />
0210<br />
0355<br />
0353<br />
1674<br />
1737<br />
0362<br />
1729<br />
1996<br />
0400<br />
0265<br />
0183<br />
1583<br />
1979<br />
0375<br />
1611<br />
0205<br />
1713<br />
2058<br />
0395<br />
0413<br />
1645<br />
2102<br />
1586<br />
1663<br />
1525<br />
antitox<strong>in</strong><br />
class<br />
N/A<br />
AbrB<br />
AbrB<br />
AbrB<br />
CcdA<br />
CcdA<br />
CcdA<br />
CcdA<br />
CcdA<br />
CcdA<br />
AbrB<br />
AbrB<br />
AbrB<br />
DUF217<br />
DUF217<br />
AbrB<br />
AbrB<br />
DUG217<br />
AbrB<br />
CcdA<br />
AbrB<br />
AbrB<br />
CcdA<br />
AbrB<br />
CcdA<br />
unknown<br />
Duf217<br />
o<strong>the</strong>r<br />
genomes<br />
12<br />
3<br />
1<br />
6<br />
8<br />
1<br />
12<br />
7<br />
0<br />
0<br />
4<br />
13<br />
7<br />
7<br />
2<br />
4<br />
0<br />
4<br />
0<br />
13<br />
7<br />
7<br />
0<br />
1<br />
8<br />
4<br />
0<br />
genomes is <strong>in</strong>dicated <strong>in</strong> b where 0 <strong>in</strong>dicates it is absent from all <strong>the</strong><br />
genomes whilst 13 <strong>in</strong>dicates that it is present <strong>in</strong> all. The antitox<strong>in</strong><br />
correspond<strong>in</strong>g to VapC-0183 is not annotated <strong>in</strong> <strong>the</strong> genome because<br />
it lacks a start codon but it is <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> figure. The VapC-like<br />
prote<strong>in</strong> (Ahos0712) is part of <strong>the</strong> operon with a translation-related<br />
prote<strong>in</strong> and lacks a VapB. The Ahos1664/1663 pair are variant ORFs<br />
where both VapB and VapC are longer than usual and <strong>the</strong> VapB does<br />
not cluster with <strong>the</strong> families <strong>in</strong> a<br />
it was proposed that chromosomally encoded tox<strong>in</strong>s may<br />
facilitate ma<strong>in</strong>tenance of local DNA regions where vapBC<br />
gene pairs are located that might o<strong>the</strong>rwise be prone to loss<br />
(Magnuson 2007; Van Melderen 2010). This hypo<strong>the</strong>sis is<br />
consistent with <strong>the</strong> observation that most of <strong>the</strong> A. hospitalis<br />
vapBC gene pairs lie with<strong>in</strong> <strong>the</strong> two variable genomic<br />
regions where DNA regions are exchanged (Fig. 1).<br />
Moreover, it receives strong support from both <strong>the</strong> high
Extremophiles (2011) 15:487–497 495<br />
A<br />
Family II<br />
+ RAMP<br />
B<br />
Family I<br />
C<br />
unknown<br />
vapBC vapBC<br />
353<br />
354<br />
363<br />
vapBC<br />
1737<br />
1738<br />
355<br />
356<br />
diversity, and <strong>the</strong> uniqueness of all <strong>the</strong> VapC prote<strong>in</strong>s<br />
encoded with<strong>in</strong> <strong>the</strong> A. hospitalis chromosome (Fig. 4b),<br />
because any similar VapBC complexes would compensate<br />
for <strong>the</strong> loss of one ano<strong>the</strong>r, <strong>the</strong>reby underm<strong>in</strong><strong>in</strong>g any DNA<br />
ma<strong>in</strong>tenance activity.<br />
In slowly grow<strong>in</strong>g organisms, from nutrient poor environments,<br />
multiple tox<strong>in</strong>s are also assumed to be <strong>in</strong>volved<br />
<strong>in</strong> stress response and/or quality control (Gerdes 2000;<br />
Pandey and Gerdes 2005). Involvement <strong>in</strong> stress response<br />
entails that <strong>the</strong> more stable tox<strong>in</strong>s <strong>in</strong>hibit growth and allow<br />
<strong>the</strong> host to lie <strong>in</strong> a dormant state dur<strong>in</strong>g <strong>the</strong> period of<br />
environmental stress (Gerdes 2000). However, <strong>the</strong>re may<br />
also be a negative effect on host growth due to <strong>the</strong> cont<strong>in</strong>uous<br />
presence of low levels of free tox<strong>in</strong> (Wilbur et al.<br />
2005). Thus, <strong>the</strong> presence of many vapBC gene pairs<br />
<strong>in</strong> A. hospitalis could reflect a compromise between <strong>the</strong><br />
ability to survive different environmental stresses and<br />
ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g an adequate growth rate under normal conditions.<br />
This would be also consistent with <strong>the</strong> presence of<br />
three families of VapB prote<strong>in</strong>s and high sequence diversity<br />
of <strong>the</strong> VapC prote<strong>in</strong>s, s<strong>in</strong>ce functionally overlapp<strong>in</strong>g<br />
<strong>system</strong>s would be redundant for stress responses and <strong>the</strong>y<br />
would confer an unnecessary burden on host growth. The<br />
proposed dual roles of ma<strong>in</strong>tenance of local chromosomal<br />
DNA regions and provid<strong>in</strong>g resistance to stress and are not<br />
mutually exclusive.<br />
Although <strong>the</strong> mechanism of action of VapC tox<strong>in</strong>s<br />
rema<strong>in</strong>s unknown (Arcus et al. 2011), <strong>in</strong> A. hospitalis, a<br />
s<strong>in</strong>gle vapC-like gene (Ahos0712) is directly coupled to<br />
genes encod<strong>in</strong>g prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> transcription and <strong>in</strong>itiator<br />
tRNA b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> ribosome, and this gene cassette<br />
53<br />
<strong>CRISPR</strong><br />
cas4 csx1 vapBC<br />
csm1 csm2 csm3<br />
<strong>CRISPR</strong><br />
csa1 vapBC PaREP cas2 cas1 <strong>CRISPR</strong><br />
cas6<br />
1739<br />
364<br />
365<br />
52 12 8<br />
cas6 csaX casHD cas3 cas5 csa2 csa5csa3 csa1 cas1 cas2 cas4 csa3<br />
8 4<br />
Fig. 5 Schematic representations of <strong>the</strong> <strong>CRISPR</strong> loci of A. hospitalis.<br />
a Family II <strong>CRISPR</strong> module carry<strong>in</strong>g three <strong>CRISPR</strong> loci and Cmr and<br />
Cas family gene cassettes which are both <strong>in</strong>terrupted by, or bordered<br />
by, four vapBC gene pairs (orange). b Paired family I <strong>CRISPR</strong>/Cas<br />
<strong>system</strong> flanked by one vapBC gene pair, and c. an unclassified<br />
<strong>CRISPR</strong> locus lack<strong>in</strong>g a leader region and adjacent cas genes. csm1 is<br />
a homolog of cmr2, csm2 is a homolog of cmr5 and csm3 is a<br />
357<br />
1740<br />
1741<br />
366<br />
1742<br />
367<br />
368<br />
1743<br />
358<br />
369<br />
39<br />
1744<br />
359<br />
370<br />
371<br />
1745<br />
1746<br />
360<br />
13<br />
1747<br />
9b<br />
361<br />
362<br />
372<br />
1748<br />
1749<br />
373<br />
*<br />
1750<br />
1751<br />
374<br />
375<br />
376<br />
1752<br />
5<br />
homolog of cmr4. The light blue genes each carry two short RAMP<br />
motifs. a–c Structures of <strong>the</strong> <strong>in</strong>dividual <strong>CRISPR</strong> loci are shown<br />
toge<strong>the</strong>r with <strong>the</strong> leader region (L) where each triangle represents a<br />
spacer-repeat unit. Significant spacer matches to sequenced viruses<br />
and plasmids are colour coded: red rudivirus, orange lipothrixvirus,<br />
yellow fusellovirus, green bicaudavirus, turquoise turreted icosahedral<br />
virus, blue conjugative plasmid and violet cryptic plasmid<br />
is highly conserved <strong>in</strong> gene content and sequence <strong>in</strong> o<strong>the</strong>r<br />
Sulfolobus genomes (Guo et al. 2011). This suggests that<br />
this VapC prote<strong>in</strong>, at least, may also regulate or <strong>in</strong>hibit<br />
translational <strong>in</strong>itiation by b<strong>in</strong>d<strong>in</strong>g at <strong>the</strong> ribosomal A-site,<br />
as demonstrated recently for a RelE type tox<strong>in</strong> (Neubauer<br />
et al. 2009). A similar <strong>in</strong>activation mechanism would be<br />
plausible for <strong>the</strong> VapC tox<strong>in</strong>s, if one assumes that<br />
expression of <strong>the</strong> <strong>in</strong>dividual VapBC complexes is stimulated<br />
by ei<strong>the</strong>r <strong>the</strong> requirement to ma<strong>in</strong>ta<strong>in</strong> different local<br />
regions of chromosomal DNA or different environmental<br />
stresses.<br />
Despite <strong>the</strong> complexity of <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong><br />
<strong>system</strong>s present <strong>in</strong> <strong>the</strong> genome, <strong>the</strong>y appear to be, at best,<br />
only partially functional. Thus, <strong>the</strong> family II <strong>CRISPR</strong>/Cas<br />
<strong>system</strong> is coupled with an archaeal family D Cmr module<br />
<strong>in</strong> A. hospitalis, but is apparently defective, reta<strong>in</strong><strong>in</strong>g only<br />
its putative RNA, but not DNA, target<strong>in</strong>g function. The<br />
<strong>system</strong> lacks <strong>the</strong> group 2 cas genes (cas3, cas5, csa2, csa5,<br />
csaX) which encode prote<strong>in</strong>s implicated <strong>in</strong> target<strong>in</strong>g and<br />
<strong>in</strong>activat<strong>in</strong>g foreign DNA elements (Fig. 5). However, <strong>the</strong><br />
cas group 1 genes (cas1, cas2, cas4, csa1), putatively<br />
<strong>in</strong>volved <strong>in</strong> <strong>in</strong>tegrat<strong>in</strong>g new spacers from <strong>in</strong>vad<strong>in</strong>g DNA<br />
elements are present, and <strong>the</strong> Cmr module implicated <strong>in</strong><br />
RNA target<strong>in</strong>g are also present (Garrett et al. 2011; Shah<br />
et al. 2011). The family I <strong>system</strong> exhibits small <strong>CRISPR</strong><br />
loci, with <strong>in</strong>tact leader regions and group 2 cas genes.<br />
However, <strong>the</strong> cas2 gene <strong>in</strong> <strong>the</strong> group 1 cas gene cassette is<br />
truncated, hav<strong>in</strong>g <strong>in</strong>curred a po<strong>in</strong>t mutation which produces<br />
a premature stop codon. Thus, this <strong>system</strong> has apparently<br />
lost <strong>the</strong> ability to <strong>in</strong>tegrate new spacers. This suggests that<br />
nei<strong>the</strong>r <strong>CRISPR</strong>-based <strong>system</strong> is fully functional, despite<br />
377<br />
378<br />
9a<br />
379<br />
123
496 Extremophiles (2011) 15:487–497<br />
<strong>the</strong>ir apparent complexity. The presence of five vapBC<br />
gene pairs located ei<strong>the</strong>r with<strong>in</strong> <strong>the</strong> cmr and cas gene<br />
cassettes of <strong>the</strong> family II <strong>CRISPR</strong>/Cas module, or immediately<br />
upstream from <strong>the</strong> modules of both families, may<br />
reflect that <strong>the</strong>y help to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong>se gene cassettes on <strong>the</strong><br />
chromosome (see above).<br />
Although a range of genetic <strong>system</strong>s have been developed<br />
for Sulfolobus species, at present no genetic <strong>system</strong>s<br />
are available for <strong>the</strong> Acidianus genus and A. hospitalis<br />
provides a promis<strong>in</strong>g candidate for such studies. It has a<br />
m<strong>in</strong>imal size and <strong>the</strong> relative stability of its chromosome<br />
suggests that it is likely to generate stable deletion mutants.<br />
This, comb<strong>in</strong>ed with its ability to host different plasmids<br />
and viruses provides a promis<strong>in</strong>g start<strong>in</strong>g po<strong>in</strong>t for develop<strong>in</strong>g<br />
a genetic <strong>system</strong>.<br />
Acknowledgments We thank Mery P<strong>in</strong>a and Tamara Basta for help<br />
with <strong>the</strong> DNA preparation. The work was supported by <strong>the</strong> National<br />
Nature Science Foundation of Ch<strong>in</strong>a (30621005) and <strong>the</strong> M<strong>in</strong>istry of<br />
Science and Technology (2010CB630903), and by <strong>the</strong> Danish Natural<br />
Science Research Council (Grant no. 272-08-0391) and Danish<br />
National Research Foundation.<br />
Open Access This article is distributed under <strong>the</strong> terms of <strong>the</strong><br />
Creative Commons Attribution Noncommercial License which permits<br />
any noncommercial use, distribution, and reproduction <strong>in</strong> any<br />
medium, provided <strong>the</strong> orig<strong>in</strong>al author(s) and source are credited.<br />
References<br />
Arcus VL, McKenzie JL, Robson J, Cook GM (2011) The PINdoma<strong>in</strong><br />
ribonucleases and <strong>the</strong> prokaryotic VapBC tox<strong>in</strong>–antitox<strong>in</strong><br />
array. Prot Eng<strong>in</strong> Design Select 24:33–40<br />
Basta T, Smyth J, Forterre P, Prangishvili D, Peng X (2009) Novel<br />
archaeal plasmid pAH1 and its <strong>in</strong>teractions with <strong>the</strong> lipothrixvirus<br />
AFV1. Mol Microbiol 71:23–34<br />
Bettstetter M, Peng X, Garrett RA, Prangishvili D (2003) AFV1, a<br />
novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic archaea of <strong>the</strong> genus<br />
Acidianus. Virology 315:68–79<br />
Blount ZD, Grogan DW (2005) New <strong>in</strong>sertion sequences of<br />
Sulfolobus: functional properties and implications for genome<br />
evolution <strong>in</strong> hyper<strong>the</strong>rmophilic archaea. Mol Microbiol 55:312–<br />
325<br />
Chen Z-W, Jiang C-Y, She Q, Liu S-J, Zhou P-J (2005a) Key role of<br />
cyste<strong>in</strong>e residues <strong>in</strong> catalysis and subcellular localization of<br />
sulfur oxygenase reductase of Acidianus tengchongensis. Appl<br />
Environ Microbiol 71:621–628<br />
Chen L, Brügger K, Skovgaard M, Redder P, She Q, Torar<strong>in</strong>sson E,<br />
Greve B, Awayez M, Zibat A, Klenk HP, Garrett RA (2005b)<br />
The genome of Sulfolobus acidocaldarius, a model organism of<br />
<strong>the</strong> Crenarchaeota. J Bacteriol 187:4992–4999<br />
Cobucci-Ponzano B, Guzz<strong>in</strong>i L, Benelli D, Londei P, Perrodou E,<br />
Lecompte O, Tran D, Sun J, Wei J, Mathur EJ, Rossi M, Moracci<br />
M (2010) Functional characterisation and high-throughput<br />
proteomic analysis of <strong>in</strong>terrupted genes <strong>in</strong> <strong>the</strong> archaeon Sulfolobus<br />
solfataricus. J Proteome Res 9:2496–2507<br />
Feschotte C, Pritham EJ (2007) DNA transposons and <strong>the</strong> evolution<br />
of eukaryotic genomes. Annu Rev Genet 41:331–368<br />
123<br />
Filée J, Siguier P, Chandler M (2007) Insertion sequence diversity <strong>in</strong><br />
archaea. Microbiol Mol Biol Revs 71:121–157<br />
Garrett RA, Shah SA, Vestergaard G, Deng L, Gudbergsdottir S,<br />
Kenchappa CS, Erdmann S, She Q (2011) <strong>CRISPR</strong>-based<br />
<strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales—complexity and diversity.<br />
Biochem Soc Trans 39:51–57<br />
Gerdes K (2000) Tox<strong>in</strong>-antitox<strong>in</strong> modules may regulate dynthsis<br />
of macromolecules dur<strong>in</strong>g nutritional stress. J Bacteriol 182:561–<br />
572<br />
Goulet A, Blangy S, Redder P, Prangishvili D, Felisberto-Rodrigues<br />
C, Forterre P, Campanacci V, Cambillau C (2009) Acidianus<br />
filamentous virus 1 coat prote<strong>in</strong>s display a helical fold spann<strong>in</strong>g<br />
<strong>the</strong> filamentous archaeal viruses l<strong>in</strong>eage. Proc Natl Acad Sci<br />
USA 106:21155–21160<br />
Greve B, Jensen S, Brügger K, Zillig W, Garrett RA (2004) Genomic<br />
comparison of archaeal conjugative plasmids from Sulfolobus.<br />
<strong>Archaea</strong> 1:231–239<br />
Grogan DW (1989) Phenotypic characterization of <strong>the</strong> archaebacterial<br />
genus Sulfolobus: comparison of five wild-type stra<strong>in</strong>s. J Bacteriol<br />
171:6710–6719<br />
Guo L, Brügger K, Liu C, Shah SA, Zheng H, Zhu Y, Wang S,<br />
Lillestøl RK, Chen L, Frank J, Prangishvili D, Paul<strong>in</strong> L, She Q,<br />
Huang L, Garrett RA (2011) Genome analyses of Icelandic<br />
stra<strong>in</strong>s of Sulfolobus islandicus: model organisms for genetic and<br />
virus-host <strong>in</strong>teraction studies. J Bacteriol 193:1672–1680<br />
He Z-G, Zhong H, Li Y (2004) Acidianus tengchongensis sp. nov., a<br />
new species of acido<strong>the</strong>rmophilic archaeon isolated from an<br />
acido<strong>the</strong>rmal spr<strong>in</strong>g. Curr Microbiol 48:156–193<br />
Kletz<strong>in</strong> A (1989) Coupled enzymatic production of sulfite, thiosulfate,<br />
and hydrogen sulfide from sulfur: purification and properties of a<br />
sulfur oxygenase reductase from <strong>the</strong> facultatively anaerobic<br />
archaebacterium Desulfurolobus ambivalens.JBacteriol171:1638–<br />
1643<br />
Kletz<strong>in</strong> A (1992) Molecular characterization of <strong>the</strong> sor gene, which<br />
encodes <strong>the</strong> sulfur oxygenase/reductase of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />
Archaeum Desulfurolobus ambivalens. J Bacteriol 174:5854–<br />
5859<br />
Kletz<strong>in</strong> A (2007) Oxidation of sulfur and <strong>in</strong>organic sulfur compounds<br />
<strong>in</strong> Acidianus ambivalens. In: Dahl C, Friedrich CG (eds)<br />
Microbial sulfur metabolism. Spr<strong>in</strong>ger, Heidelberg, pp 184–199<br />
Kletz<strong>in</strong> A, Lieke A, Urich T, Charlebois RL, Sensen CW (1999)<br />
Molecular analysis of pDL10 from Acidianus ambivalens reveals<br />
a family of related plasmids from extremely <strong>the</strong>rmophilic and<br />
acidophilic archaea. Genetics 152:1307–1314<br />
Lawrence CM, Menon S, Eilers BJ, Bothner B, Khayat R, Douglas T,<br />
Young MJ (2009) Structural and functional studies of archaeal<br />
viruses. J Biol Chem 284:12599–12603<br />
Lillestøl RK, Shah SA, Brügger K, Redder P, Phan H, Christiansen J,<br />
Garrett RA (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />
Sulfolobus: bidirectional transcription and dynamic properties.<br />
Mol Microbiol 72:259–272<br />
Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved<br />
detection of transfer RNA genes <strong>in</strong> genomic sequence. Nucleic<br />
Acids Res 25:955–964<br />
Lundgren M, Andersson A, Chen L, Nilsson P, Bernander R (2004)<br />
Three replication orig<strong>in</strong>s <strong>in</strong> Sulfolobus species: synchronous<br />
<strong>in</strong>itiation of chromosome replication and asynchronous term<strong>in</strong>ation.<br />
Proc Natl Acad Sci USA 101:7046–7051<br />
Magnuson RD (2007) Hypo<strong>the</strong>tical functions of tox<strong>in</strong>–antitox<strong>in</strong><br />
<strong>system</strong>s. J Bacteriol 189:6089–6092<br />
Melderen LV (2010) Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s: why so many, what<br />
for? Curr Op<strong>in</strong> Microbiol 13:781–785<br />
Muller S, Urban A, Hecker A, Leclerc A, Branlant C, Motor<strong>in</strong> Y<br />
(2009) Deficiency of <strong>the</strong> tRNA Tyr :W35-synthase aPus7 <strong>in</strong><br />
archaea of <strong>the</strong> Sulfolobales order might be rescued by <strong>the</strong>
Extremophiles (2011) 15:487–497 497<br />
H/ACA sRNA-guided mach<strong>in</strong>ery. Nucleic Acids Res 37:1308–<br />
1322<br />
Muskhelishvili G, Palm P, Zillig W (1993) SSV1-encoded sitespecific<br />
recomb<strong>in</strong>ation <strong>system</strong> <strong>in</strong> Sulfolobus shibatae. Mol Gen<br />
Genet 273:334–342<br />
Neubauer C, Gao YG, Andersen KR, Dunham CM, Kelley AC,<br />
Hentschel J, Gerdes K, Ramakrishnan V, Brodersen DE (2009)<br />
The structural basis for mRNA recognition and cleavage by <strong>the</strong><br />
ribosome-dependent endonuclease RelE. Cell 139:1084–1095<br />
Omer AD, Zago M, Chang A, Dennis PP (2006) Prob<strong>in</strong>g <strong>the</strong> structure<br />
and function of an archaeal C/D-box methylation guide sRNA.<br />
RNA 12:1708–1720<br />
Pandey DP, Gerdes K (2005) Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly abundant<br />
<strong>in</strong> free-liv<strong>in</strong>g but lost from host-asscoiated prokaryotes. Nucleic<br />
Acids Res 33:966–976<br />
Plumb JJ, Haddad CM, Gibson JAE, Franzmann PD (2007) Acidianus<br />
sulfidivorans sp nov., an extremely acidophilic, <strong>the</strong>rmophilic<br />
archaeon isolated from a solfatara on Lihir Island, Papua New<br />
Gu<strong>in</strong>ea, and amendation of <strong>the</strong> genus description. Int J Syst Evol<br />
Microbiol 57:1418–1423<br />
Prangishvili D, Albers SV, Holz I, Arnold HP, Stedman K, Kle<strong>in</strong> T,<br />
S<strong>in</strong>gh H, Hiort J, Schweier A, Kristjansson JK, Zillig W (1998)<br />
Conjugation <strong>in</strong> archaea: frequent occurrence of conjugative<br />
plasmids <strong>in</strong> Sulfolobus. Plasmid 40:190–202<br />
Prangishvili D, Forterre P, Garrett RA (2006) Viruses of <strong>the</strong> <strong>Archaea</strong>:<br />
a unify<strong>in</strong>g view. Nat Rev Microbiol 4:837–848<br />
Rachel R, Bettstetter M, Hedlund BP, Här<strong>in</strong>g M, Kessler A, Stetter<br />
KO, Prangishvili D (2002) Arch Virol 147:2419–2429<br />
Redder P, Garrett RA (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong><br />
genome of Sulfolobus solfataricus P2. J Bacteriol 188:4198–4206<br />
Redder P, She Q, Garrett RA (2001) Non-autonomous elements <strong>in</strong> <strong>the</strong><br />
crenarchaeon Sulfolobus solfataricus. J Mol Biol 306:1–6<br />
Redder P, Peng X, Brügger K, Shah SA, Roesch F, Greve B, She Q,<br />
Schleper C, Forterre P, Garrett RA, Prangishvili D (2009) Four<br />
newly isolated fuselloviruses from extreme geo<strong>the</strong>rmal environments<br />
reveal unusual morphologies and a possible <strong>in</strong>terviral<br />
recomb<strong>in</strong>ation mechanism. Environ Microbiol 11:2849–2862<br />
Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ (2009)<br />
Sulfolobus islandicus pan-genome. Proc Natl Acad Sci USA<br />
106:8605–8610<br />
Rob<strong>in</strong>son NP, Bell SD (2007) Extrachromosomal element capture and<br />
<strong>the</strong> evolution of multiple replication orig<strong>in</strong>s <strong>in</strong> archaeal<br />
chromosomes. Proc Natl Acad Sci USA 104:5806–5811<br />
Rob<strong>in</strong>son NP, Dionne I, Lundgren M, Marsh VL, Bernander R, Bell<br />
SD (2004) Identification of two orig<strong>in</strong>s of replication <strong>in</strong> <strong>the</strong><br />
s<strong>in</strong>gle chromosome of <strong>the</strong> archaeon Sulfolobus solfataricus. Cell<br />
116:25–38<br />
Ru<strong>the</strong>rford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream<br />
MA, Barrell B (2000) Artemis: sequence visualization and<br />
annotation. Bio<strong>in</strong>formatics 16:944–945<br />
Segerer A, Neuner A, Kristjansson JK, Stetter KO (1986) Acidanus<br />
<strong>in</strong>fernus gen. nov., sp. nov., and Acidianus brierleyi comb. nov.:<br />
facultatively aerobic, extremely acidophilic <strong>the</strong>rmophilic sulfurmetaboliz<strong>in</strong>g<br />
archaebacteria. Int J Syst Bacteriol 36:559–564<br />
Shah SA, Garrett RA (2011) <strong>CRISPR</strong>/Cas and Cmr modules, mobility<br />
and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res Microbiol<br />
162:27–38<br />
Shah SA, Hansen NR, Garrett RA (2009) Distributions of <strong>CRISPR</strong><br />
spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal acido<strong>the</strong>rmophiles<br />
and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />
Trans Biochem Soc 37:23–28<br />
Shah SA, Vestergaard G, Garrett RA (2011) <strong>CRISPR</strong>/Cas and<br />
<strong>CRISPR</strong>/Cmr <strong>immune</strong> <strong>system</strong>s of archaea. In: Marchfelder A,<br />
Hess W (eds) Regulatory RNAs <strong>in</strong> prokaryotes. Spr<strong>in</strong>ger, Berl<strong>in</strong><br />
She Q, Phan H, Garrett RA, Albers S-V, Stedman KM, Zillig W<br />
(1998) Genetic profile of pNOB8 from Sulfolobus: <strong>the</strong> first<br />
conjugative plasmid from an archaeon. Extremophiles 2:417–<br />
425<br />
Stedman KM, She Q, Phan H, Holz I, S<strong>in</strong>gh H, Prangishvili D, Garrett<br />
RA, Zillig W (2000) The pING family of conjugative plasmids<br />
from <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus islandicus:<br />
<strong>in</strong>sights <strong>in</strong>to recomb<strong>in</strong>ation and conjugation <strong>in</strong> Crenarchaeota.<br />
J Bacteriol 182:7014–7020<br />
Sun CW, Chen ZW, He ZG, Zhou PJ, Liu SJ (2003) Purification and<br />
properties of <strong>the</strong> sulphur oxygenase/reductase from <strong>the</strong> acido<strong>the</strong>rmophilic<br />
archaeon, Acidianus stra<strong>in</strong> S5. Extremophiles<br />
7:131–134<br />
Tang TH, Polacek N, Zywicki M, Huber H, Brügger K, Garrett R,<br />
Bachellerie JP, Hüttenhofer A (2005) Identification of novel<br />
non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong><br />
archaeon Sulfolobus solfataricus. Mol Microbiol 55:469–481<br />
Ton-Hoang B, Pasternak C, Siguier P, Guynet C, Hickman AB, Dyda<br />
F, Sommer S, Chandler M (2010) S<strong>in</strong>gle-stranded DNA<br />
transposition is coupled to host replication. Cell 142:398–408<br />
Torar<strong>in</strong>sson E, Klenk H-P, Garrett RA (2005) Divergent transcriptional<br />
and translational signals <strong>in</strong> <strong>Archaea</strong>. Environ Microbiol<br />
7:47–54<br />
Wilbur JS, Chivers PT, Mattison K, Potter L, Brennan RG, So M<br />
(2005) Neisseria gonorrheae FitA <strong>in</strong>teracts with FitB to b<strong>in</strong>d<br />
DNA through its ribbon–helix–helix motif. Biochemistry 44:<br />
12515–12524<br />
Wurtzel O, Sapra R, Chen F, Zhu ZY, Simmons BA, Sorek R (2010)<br />
A s<strong>in</strong>gle-base resolution map of an archaeal transcriptome.<br />
Genome Res 20:133–141<br />
Yokobori S, Itoh T, Yosh<strong>in</strong>ari S, Nomura N, Sako Y, Yamagishi A,<br />
Oshima T, Kita K, Watanabe Y (2009) Ga<strong>in</strong> and loss of an <strong>in</strong>tron<br />
<strong>in</strong> a prote<strong>in</strong>-cod<strong>in</strong>g gene <strong>in</strong> <strong>Archaea</strong>: <strong>the</strong> case of an archaeal<br />
RNA pseudourid<strong>in</strong>e synthase gene. BMC Evol Biol 9:198<br />
Yoshida N, Nakasato M, Ohmura N, Ando A, Saolo J, Ishii M,<br />
Igarashi Y (2006) Acidianus manzaensis sp. nov., a novel<br />
<strong>the</strong>rmoacidophilic Archaeon grow<strong>in</strong>g autotrophicallly by <strong>the</strong><br />
oxidation of H 2 with <strong>the</strong> reduction of Fe 3? . Curr Microbiol<br />
53:406–411<br />
Zhang R, Zhang CT (2003) Multiple replication orig<strong>in</strong>s of <strong>the</strong><br />
archaeon Halobacterium species NRC-1. Biochem Biophys Res<br />
Comm 302:728–734<br />
123
Review<br />
<strong>Archaea</strong>l <strong>CRISPR</strong>-based <strong>immune</strong><br />
<strong>system</strong>s: exchangeable functional<br />
modules<br />
Roger A. Garrett, Gisle Vestergaard and Shiraz A. Shah<br />
<strong>Archaea</strong> Centre, Department of Biology, Ole Maaløes Vej 5, University of Copenhagen, DK2200 Copenhagen N, Denmark<br />
<strong>CRISPR</strong> (clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic<br />
repeats)-based <strong>immune</strong> <strong>system</strong>s are essentially<br />
modular with three primary functions: <strong>the</strong> excision and<br />
<strong>in</strong>tegration of new spacers, <strong>the</strong> process<strong>in</strong>g of <strong>CRISPR</strong><br />
transcripts to yield mature <strong>CRISPR</strong> RNAs (crRNAs), and<br />
<strong>the</strong> target<strong>in</strong>g and cleavage of foreign nucleic acid. The<br />
primary target appears to be <strong>the</strong> DNA of foreign genetic<br />
elements, but <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> that is widespread<br />
amongst archaea also specifically targets and<br />
cleaves RNA <strong>in</strong> vitro. The archaeal <strong>CRISPR</strong> <strong>system</strong>s tend<br />
to be both diverse and complex. Here we exam<strong>in</strong>e evidence<br />
for exchange of functional modules between archaeal<br />
<strong>system</strong>s that is likely to contribute to <strong>the</strong>ir<br />
diversity, particularly of <strong>the</strong>ir nucleic acid target<strong>in</strong>g<br />
and cleavage functions. The molecular constra<strong>in</strong>ts that<br />
limit such exchange are considered. We also summarize<br />
mechanisms underly<strong>in</strong>g <strong>the</strong> dynamic nature of <strong>CRISPR</strong><br />
loci and <strong>the</strong> evidence for <strong>in</strong>tergenomic exchange of<br />
<strong>CRISPR</strong> <strong>system</strong>s.<br />
<strong>Archaea</strong> and <strong>CRISPR</strong> immunity<br />
The early evolutionary history of archaea rema<strong>in</strong>s unresolved.<br />
<strong>Archaea</strong> could have descended directly from a<br />
universal common ancestor, undergone a shared period<br />
of descent with eukarya, or have been streaml<strong>in</strong>ed from a<br />
more complex (and eukaryal-like) ancestor [1,2]. Although<br />
many cellular processes of archaea and eukarya share<br />
common features that are absent from bacteria [1], <strong>the</strong><br />
uniqueness of archaea appears to lie <strong>in</strong> <strong>the</strong>ir successful<br />
adaptation to extreme environmental conditions <strong>in</strong>clud<strong>in</strong>g<br />
high temperature, extremes of pH, high salt, high pressures,<br />
and strictly anaerobic conditions. These environments<br />
tend to be low <strong>in</strong> sources of energy consistent with<br />
<strong>the</strong> hypo<strong>the</strong>sis that some unique archaeal properties were<br />
ma<strong>in</strong>ta<strong>in</strong>ed through adaptation to chronic energy stress<br />
via, for example, <strong>the</strong>ir catabolic pathways and mechanisms<br />
of energy conservation facilitated by low permeability<br />
e<strong>the</strong>r-l<strong>in</strong>ked lipid membranes [3].<br />
This exceptional biology is reflected <strong>in</strong> <strong>the</strong> properties of<br />
<strong>the</strong> archaeal viruses. Most of those characterized, especially<br />
from extreme <strong>the</strong>rmophilic and halophilic environments,<br />
show morphotypes and genomic properties dist<strong>in</strong>ct from<br />
viruses of bacteria and eukarya [4–6]. There are also<br />
prelim<strong>in</strong>ary <strong>in</strong>dications that levels of free viruses, at least<br />
Correspond<strong>in</strong>g author: Garrett, R.A. (garrett@bio.ku.dk).<br />
<strong>in</strong> extreme <strong>the</strong>rmoacidophilic environments, tend to be low<br />
relative to cellular levels, suggest<strong>in</strong>g that <strong>the</strong>se viruses<br />
prefer to rema<strong>in</strong> ‘<strong>in</strong>side’ cells [7]. Moreover, archaeal<br />
viruses generally exist <strong>in</strong> stable relationships with <strong>the</strong>ir<br />
hosts at low copy-numbers and rarely cause cell lysis<br />
[4,6,8,9].<br />
<strong>CRISPR</strong> <strong>system</strong>s (Box 1) provide immunity aga<strong>in</strong>st<br />
<strong>in</strong>vasion by viruses and conjugative plasmids, are present<br />
<strong>in</strong> most studied archaea and <strong>in</strong> about 40% of bacteria, and<br />
have a common evolutionary orig<strong>in</strong> [10,11]. The <strong>CRISPR</strong><br />
<strong>system</strong>s <strong>in</strong> many archaea are unusual <strong>in</strong> that <strong>the</strong>y tend to<br />
be both diverse and complex, suggest<strong>in</strong>g that <strong>the</strong>y have <strong>the</strong><br />
potential to be more versatile functionally and with more<br />
possibilities for regulation than <strong>in</strong> many bacteria [11,12].<br />
Given <strong>the</strong> tendency of many archaeal viruses and conjugative<br />
plasmids to ma<strong>in</strong>ta<strong>in</strong> stable relationships with <strong>the</strong>ir<br />
hosts, and to avoid target<strong>in</strong>g by <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>,<br />
different regulatory <strong>system</strong>s might play an important role<br />
[4,6,13]. For example, <strong>the</strong> <strong>immune</strong> response may only be<br />
activated at certa<strong>in</strong> levels of viral DNA replication or<br />
transcription.<br />
All <strong>CRISPR</strong> <strong>system</strong>s have three basic functions. First,<br />
<strong>the</strong> excision of protospacer DNA from <strong>in</strong>vad<strong>in</strong>g genetic<br />
elements and <strong>in</strong>sertion <strong>in</strong>to <strong>CRISPR</strong> loci, a process termed<br />
adaptation. Second, transcripts from complete <strong>CRISPR</strong><br />
loci are processed to yield crRNAs that are <strong>the</strong>n assembled<br />
<strong>in</strong>to prote<strong>in</strong> complexes. Third, <strong>the</strong>se complexes target and<br />
cleave <strong>the</strong> DNA or RNA of <strong>in</strong>vad<strong>in</strong>g genetic elements,<br />
termed <strong>in</strong>terference. These steps are illustrated <strong>in</strong><br />
Figure 1 and <strong>the</strong> ma<strong>in</strong> components are def<strong>in</strong>ed <strong>in</strong> Box 1.<br />
<strong>CRISPR</strong>-based <strong>system</strong>s have recently been reclassified<br />
<strong>in</strong>to three ma<strong>in</strong> types, of which only types I and III occur<br />
<strong>in</strong> archaea (Box 2). Prote<strong>in</strong> components of <strong>CRISPR</strong> <strong>system</strong>s<br />
are manifold and highly diverse. Several core prote<strong>in</strong><br />
functions have been predicted from sequence analyses or<br />
crystal structures [14,15] but with few exceptions <strong>the</strong>ir<br />
detailed mechanistic roles rema<strong>in</strong> to be determ<strong>in</strong>ed experimentally<br />
(Box 1). Similarities of essential components and<br />
core mechanisms of archaeal and bacterial <strong>CRISPR</strong> <strong>system</strong>s<br />
are consistent with <strong>the</strong>ir hav<strong>in</strong>g a common evolutionary<br />
orig<strong>in</strong> [10,11].<br />
Attempts to classify <strong>CRISPR</strong> <strong>system</strong>s phylogenetically<br />
have previously <strong>in</strong>volved sequence alignments of <strong>the</strong> most<br />
conserved Cas1 prote<strong>in</strong> [14,16]. This prote<strong>in</strong> is almost<br />
ubiquitous and is associated with <strong>the</strong> adaptation step<br />
(Figure 1). Phylogenetic studies on crenarchaeal <strong>system</strong>s<br />
0966-842X/$ – see front matter ß 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.tim.2011.08.002 Trends <strong>in</strong> Microbiology, November 2011, Vol. 19, No. 11 549
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
Box 1. Core components of <strong>CRISPR</strong> <strong>system</strong>s<br />
Here we summarize <strong>the</strong> ma<strong>in</strong> components of <strong>CRISPR</strong> <strong>system</strong>s.<br />
Leaders: all active <strong>CRISPR</strong> loci to date are preceded by a leader of<br />
about 300–400 bp, carry<strong>in</strong>g some low complexity sequence and<br />
conserved regions, that is likely to be <strong>in</strong>volved <strong>in</strong> <strong>the</strong> adaptation step<br />
at or near <strong>the</strong> first repeat [16,17]. The <strong>CRISPR</strong> proximal region of <strong>the</strong><br />
leader also carries <strong>the</strong> ma<strong>in</strong> promoter for <strong>CRISPR</strong> transcription [16].<br />
<strong>CRISPR</strong> loci: <strong>the</strong>se consist of arrays of identical direct repeats of 24–<br />
37 bp <strong>in</strong> size and, <strong>in</strong> archaea, often conta<strong>in</strong> up to 100 repeat units.<br />
These are <strong>in</strong>terspaced with similarly sized spacers (35–44 bp) carry<strong>in</strong>g<br />
unique sequences that derive from <strong>in</strong>vad<strong>in</strong>g DNA genetic elements.<br />
They are dynamic structures that undergo loss and exchange of<br />
spacer-repeat units, probably via recomb<strong>in</strong>ation events at repeats<br />
[16,35]. Thus <strong>the</strong>y provide a record, albeit <strong>in</strong>complete, of previous<br />
<strong>in</strong>vad<strong>in</strong>g genetic elements, although if <strong>CRISPR</strong> loci have recently<br />
exchanged between related organisms, as occurs for S. islandicus<br />
[17], <strong>the</strong> record will be erroneous. There is currently no evidence to<br />
<strong>in</strong>dicate whe<strong>the</strong>r spacers can orig<strong>in</strong>ate from RNA viruses.<br />
Protospacer: a segment of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA genetic element that is<br />
<strong>in</strong>corporated <strong>in</strong>to a <strong>CRISPR</strong> locus at or near <strong>the</strong> first repeat, and <strong>in</strong> a<br />
direction predeterm<strong>in</strong>ed by <strong>the</strong> location of <strong>the</strong> adjacent protospacerassociated<br />
motif.<br />
Protospacer-associated motif (PAM): this motif is essential for <strong>the</strong><br />
<strong>immune</strong> response [19]. It corresponds to a short sequence, positioned at<br />
approximately –2 to –4 bp from <strong>the</strong> end of <strong>the</strong> protospacer that becomes<br />
leader-proximal on <strong>in</strong>sertion <strong>in</strong>to a <strong>CRISPR</strong> locus. This suggests that <strong>the</strong><br />
base-paired motif <strong>in</strong>fluences protospacer selection from genetic<br />
elements [16,19,31]. Ano<strong>the</strong>r proposed function of <strong>the</strong> PAM motif is<br />
that it ensures <strong>the</strong> presence of mismatched base-pairs between 5 0 ends<br />
of crRNAs and targeted DNA as a prerequisite for avoid<strong>in</strong>g self<strong>in</strong>terference<br />
of <strong>CRISPR</strong> loci [46]. The PAM motif may also play a more<br />
specific role <strong>in</strong> DNA <strong>in</strong>terference, although how it is recognized and <strong>the</strong><br />
degree of PAM sequence str<strong>in</strong>gency required rema<strong>in</strong> unknown [35,41].<br />
crRNAs: <strong>the</strong> f<strong>in</strong>al products of process<strong>in</strong>g of pre-<strong>CRISPR</strong> RNAs, many<br />
of which exhibit short <strong>in</strong>verted repeats [58]. They are produced for<br />
DNA target<strong>in</strong>g by <strong>in</strong>troduc<strong>in</strong>g s<strong>in</strong>gle cuts <strong>in</strong> adjacent repeats, and<br />
provided evidence for coevolution of Cas1 prote<strong>in</strong> and <strong>the</strong><br />
leader and repeat sequences, strongly suggest<strong>in</strong>g that<br />
<strong>the</strong>se structural components are functionally <strong>in</strong>terdependent<br />
<strong>in</strong> adaptation [16,17]. However, when attempts were<br />
made to extend <strong>the</strong>se analyses to conserved crenarchaeal<br />
<strong>CRISPR</strong> components implicated <strong>in</strong> RNA process<strong>in</strong>g or<br />
nucleic acid <strong>in</strong>terference, divergent trees were obta<strong>in</strong>ed,<br />
suggest<strong>in</strong>g that <strong>CRISPR</strong> <strong>system</strong>s are non-<strong>in</strong>tegral and that<br />
modular exchange can occur [17,18].<br />
DNA<br />
Virus V<br />
Plasmid<br />
aCas complex<br />
DNA<br />
excision<br />
Adaptation<br />
Leader<br />
New spacer<br />
each crRNA carries a spacer sequence flanked by repeat sequence<br />
fragments [20,32]. In Cmr-based RNA target<strong>in</strong>g, crRNAs are fur<strong>the</strong>r<br />
processed at <strong>the</strong> 3 0 end by an unknown enzyme [21,22]. crRNA<br />
complexes with Cas, Csm or Cmr prote<strong>in</strong>s target <strong>in</strong>vad<strong>in</strong>g nucleic<br />
acids by base-pair<strong>in</strong>g to highly similar sequences, where perfect<br />
match<strong>in</strong>g of <strong>the</strong> 5 0 term<strong>in</strong>al spacer sequence of <strong>the</strong> crRNA can be<br />
especially important for DNA target<strong>in</strong>g [35,40,43].<br />
<strong>CRISPR</strong>-associated prote<strong>in</strong>s (Cas): although many functions have<br />
been predicted bio<strong>in</strong>formatically for core Cas prote<strong>in</strong>s, few have been<br />
tested experimentally [14,15]. Cas1 and Cas2 are universally <strong>in</strong>volved<br />
<strong>in</strong> adaptation and <strong>the</strong> prote<strong>in</strong>s exhibit metal-dependent DNA and RNA<br />
endonuclease activity, respectively [59,60]. Cas4 carries a predicted<br />
RecB nuclease doma<strong>in</strong> and is sometimes fused to Cas1, and is<br />
<strong>the</strong>reby implicated <strong>in</strong> adaptation. DNA <strong>in</strong>terference by <strong>the</strong> <strong>CRISPR</strong>/Cas<br />
<strong>system</strong> requires at least three core prote<strong>in</strong>s (Cas5, Cas7, and Cas3),<br />
which carry helicase and s<strong>in</strong>gle-stranded DNA nuclease activities and<br />
are associated with <strong>in</strong>vader DNA cleavage [61]. A large group of RNA<br />
recognition motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s (RAMPs) also carry small<br />
glyc<strong>in</strong>e-rich motifs, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> diverse Cas6 prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong><br />
<strong>CRISPR</strong> RNA process<strong>in</strong>g and many of <strong>the</strong> prote<strong>in</strong>s mak<strong>in</strong>g up <strong>the</strong> Csm<br />
and Cmr prote<strong>in</strong> target<strong>in</strong>g complexes for DNA and RNA, respectively<br />
[23,46].<br />
CASCADE (<strong>CRISPR</strong>-associated complex for antiviral defense): first<br />
characterized for <strong>the</strong> E. coli <strong>CRISPR</strong>/Cas <strong>system</strong>, this constitutes a<br />
prote<strong>in</strong> complex of Cas5e, Cas6, Cas7 (six copies) and two subtype<br />
specific prote<strong>in</strong>s Cse1 and Cse 2 (two copies) [20,45]. It generates a<br />
seahorse-shaped structure encompass<strong>in</strong>g <strong>the</strong> crRNA and specifically<br />
targets <strong>the</strong> complementary strand of protospacer-like DNA (and<br />
unspecifically ssRNA) but does not cleave it. The presence of a Cas6<br />
homolog underl<strong>in</strong>es an additional l<strong>in</strong>k to process<strong>in</strong>g [45]. A similar<br />
structure was modeled for a Sulfolobus complex conta<strong>in</strong><strong>in</strong>g Cas5e,<br />
multiple copies of Cas7 and crRNA, that also targeted DNA but only<br />
<strong>in</strong>teracted weakly with Cas6 and o<strong>the</strong>r Cas prote<strong>in</strong>s [42]. The similarity<br />
of <strong>the</strong> two structures suggests that this may be a universal structure<br />
for DNA target<strong>in</strong>g.<br />
In this review we focus primarily on archaeal <strong>CRISPR</strong><br />
<strong>system</strong>s. The degree of functional and structural <strong>in</strong>terdependence<br />
of <strong>the</strong> functional modules is summarized and<br />
evidence is provided for modular exchange. Fur<strong>the</strong>r, molecular<br />
and sequence constra<strong>in</strong>ts that limit <strong>the</strong> capacity for<br />
exchange are considered and it is <strong>in</strong>ferred that advantages<br />
of exchange lie primarily <strong>in</strong> generat<strong>in</strong>g <strong>in</strong>terference diversity.<br />
Fur<strong>the</strong>r, we summarize <strong>the</strong> evidence for <strong>CRISPR</strong><br />
loci be<strong>in</strong>g dynamic structures and describe factors that<br />
Repeat<br />
pCas poly-crRNA<br />
Process<strong>in</strong>g<br />
iCas-crRNA<br />
iCmr-crRNA<br />
Interference<br />
DNA<br />
Viral/plasmid<br />
DNA<br />
Cleavage<br />
Interference<br />
RNA<br />
Cleaved<br />
mRNA<br />
Cleav Cleaved<br />
viral RRNA<br />
TRENDS <strong>in</strong> Microbiology<br />
Figure 1. Scheme for <strong>the</strong> three primary functions of <strong>CRISPR</strong> <strong>system</strong>s. In <strong>the</strong> adaptation step, Cas prote<strong>in</strong>s excise <strong>the</strong> protospacer sequence from a foreign DNA genetic<br />
element and <strong>in</strong>sert it <strong>in</strong>to <strong>the</strong> repeat adjacent to <strong>the</strong> leader of <strong>the</strong> <strong>CRISPR</strong> locus. Pre-<strong>CRISPR</strong> RNAs are <strong>the</strong>n transcribed from with<strong>in</strong> <strong>the</strong> leader and are subsequently<br />
processed <strong>in</strong>to crRNAs each carry<strong>in</strong>g a s<strong>in</strong>gle spacer sequence and part of <strong>the</strong> adjo<strong>in</strong><strong>in</strong>g repeat sequence. At <strong>the</strong> <strong>in</strong>terference stage, crRNAs are assembled <strong>in</strong>to prote<strong>in</strong><br />
target<strong>in</strong>g complexes that anneal to, and cleave, match<strong>in</strong>g spacer sequences on ei<strong>the</strong>r <strong>in</strong>vad<strong>in</strong>g elements or <strong>the</strong>ir transcripts.<br />
550
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
Box 2. Classification and nomenclature<br />
<strong>CRISPR</strong>-related prote<strong>in</strong>s have been classified <strong>in</strong>to eight types of<br />
<strong>CRISPR</strong> <strong>system</strong>s and up to 45 families of associated prote<strong>in</strong>s [14,61].<br />
An attempt was recently made to simplify both <strong>the</strong> <strong>CRISPR</strong><br />
classification and prote<strong>in</strong> nomenclature and <strong>the</strong> results perta<strong>in</strong><strong>in</strong>g<br />
especially to archaeal <strong>system</strong>s (summarized below) are presented<br />
toge<strong>the</strong>r with a suggested term<strong>in</strong>ology that we use for label<strong>in</strong>g <strong>the</strong><br />
diverse functional modules present <strong>in</strong> archaea [16].<br />
<strong>CRISPR</strong> <strong>system</strong>s: <strong>the</strong>se are now grouped <strong>in</strong>to three major classes –<br />
types I to III (with a few subtypes) – based primarily on sequences of<br />
<strong>the</strong> Cas1 and Cas2 prote<strong>in</strong>s implicated <strong>in</strong> adaptation, but also tak<strong>in</strong>g<br />
<strong>in</strong>to account gene cassette contents [15]. Type I <strong>system</strong>s have been<br />
implicated <strong>in</strong> DNA target<strong>in</strong>g (exemplified <strong>in</strong> Figure 2a) and are<br />
generally characterized by a Cas3 endonuclease considered to cleave<br />
<strong>in</strong>vad<strong>in</strong>g foreign DNA [61]. Type II are bacteria-specific and require a<br />
<strong>CRISPR</strong>-associated trans-encoded small RNA (tracrRNA) and hostencoded<br />
RNase III for process<strong>in</strong>g. The large multifunctional Cas9<br />
prote<strong>in</strong> alone appears to facilitate <strong>the</strong> f<strong>in</strong>al process<strong>in</strong>g and <strong>in</strong>terference<br />
steps [36]. Type III <strong>system</strong>s are over-represented <strong>in</strong> archaea<br />
and <strong>in</strong>clude all <strong>CRISPR</strong> <strong>system</strong>s carry<strong>in</strong>g Cmr or Csm prote<strong>in</strong>s,<br />
illustrated for archaea <strong>in</strong> Figure 2b–f. Some of <strong>the</strong>se prote<strong>in</strong>s (Cmr2/<br />
Csm1 and Cmr4/Csm3) are homologs, whereas o<strong>the</strong>rs show m<strong>in</strong>imal<br />
sequence conservation but carry RNA recognition and glyc<strong>in</strong>e-rich<br />
motifs (RAMP prote<strong>in</strong>s) [12,14]. The Csm and Cmr prote<strong>in</strong> complexes<br />
contribute to <strong>the</strong>ir structural changes and, f<strong>in</strong>ally, evidence<br />
for <strong>in</strong>tergenomic exchange of <strong>CRISPR</strong> <strong>system</strong>s is<br />
discussed. Detailed experimental data perta<strong>in</strong><strong>in</strong>g to <strong>the</strong><br />
mechanisms <strong>in</strong>volved <strong>in</strong> <strong>the</strong> core functional steps <strong>in</strong> archaea<br />
and bacteria have recently been reviewed [11] and<br />
will not be covered <strong>in</strong> depth here.<br />
Functional modules<br />
<strong>CRISPR</strong> <strong>system</strong>s all exhibit three basic functional steps<br />
illustrated <strong>in</strong> Figure 1. (i) Adaptation <strong>in</strong>volves recognition<br />
and degradation of foreign DNA by Cas prote<strong>in</strong>s and<br />
<strong>in</strong>corporation of a DNA fragment <strong>in</strong>to <strong>the</strong> <strong>CRISPR</strong> locus<br />
as a new spacer presumed to occur at <strong>the</strong> repeat adjacent to<br />
<strong>the</strong> leader [16,19]. (ii) In <strong>the</strong> second step, <strong>the</strong> complete<br />
<strong>CRISPR</strong> locus is transcribed from with<strong>in</strong> <strong>the</strong> leader and<br />
processed <strong>in</strong>to multiple <strong>CRISPR</strong> RNAs (crRNAs) each<br />
carry<strong>in</strong>g a s<strong>in</strong>gle spacer sequence and one or more adjo<strong>in</strong><strong>in</strong>g<br />
repeat regions. Prote<strong>in</strong>s implicated <strong>in</strong> <strong>the</strong> archaeal<br />
RNA process<strong>in</strong>g are <strong>the</strong> core prote<strong>in</strong> Cas6 and at least<br />
one o<strong>the</strong>r unidentified prote<strong>in</strong> [20–22]. (iii) Interference<br />
(or <strong>in</strong>vader silenc<strong>in</strong>g) of DNA or RNA occurs when a<br />
prote<strong>in</strong>–crRNA complex targets and cleaves a highly similar<br />
sequence of <strong>the</strong> genetic element [23–25]. At present<br />
three <strong>in</strong>terference <strong>system</strong>s have been identified based on<br />
Cas and Csm prote<strong>in</strong> complexes each target<strong>in</strong>g DNA <strong>in</strong><br />
vivo and Cmr prote<strong>in</strong>s target<strong>in</strong>g RNA <strong>in</strong> vitro. Here we<br />
<strong>in</strong>troduce terms for <strong>the</strong> ma<strong>in</strong> molecular components <strong>in</strong>volved<br />
<strong>in</strong> each functional step to simplify <strong>the</strong> discussion of<br />
functional module exchange as follows: aCas for adaptation;<br />
pCas for process<strong>in</strong>g, and iCas, iCsm and iCmr for<br />
nucleic acid <strong>in</strong>terference (Box 2).<br />
Currently about 165 <strong>CRISPR</strong> <strong>system</strong>s from 110 archaeal<br />
genomes are available <strong>in</strong> public sequence databases and<br />
have provided a basis for analyz<strong>in</strong>g gene organization<br />
patterns of different functional modules [12,15,26]. They<br />
reveal six major comb<strong>in</strong>ations of gene cassettes illustrated<br />
with color-coded functional modules <strong>in</strong> Figure 2. Whereas<br />
<strong>the</strong> aCas cassette is relatively conserved <strong>in</strong> <strong>the</strong> first four<br />
comb<strong>in</strong>ations, <strong>the</strong> <strong>in</strong>terference modules are diverse <strong>in</strong><br />
are implicated <strong>in</strong> target<strong>in</strong>g and cleavage of DNA and RNA, respectively<br />
[23,25]. In archaea <strong>the</strong> type I and type III <strong>system</strong>s are often<br />
functionally <strong>in</strong>terdependent [17,44].<br />
Prote<strong>in</strong> nomenclature: <strong>the</strong> names of prote<strong>in</strong>s Cas1 to Cas6 are<br />
reta<strong>in</strong>ed but <strong>the</strong>y are extended to <strong>in</strong>clude many disparate homologs<br />
<strong>in</strong> different organisms. Cas7 to Cas10 represent new categories, each<br />
of which br<strong>in</strong>gs toge<strong>the</strong>r a group of differently named homologs. The<br />
changes especially relevant to archaeal <strong>CRISPR</strong> <strong>system</strong>s are: Cas7 for<br />
Csa2, Cas8 for Csa4, and Cas10 is proposed for homologs Cmr2 and<br />
Csm1 of type III <strong>system</strong>s (Figure 3). Cas9 is exclusive to <strong>the</strong> bacteriaspecific<br />
type II <strong>system</strong>.<br />
Functional module nomenclature: <strong>the</strong> follow<strong>in</strong>g terms are <strong>in</strong>troduced<br />
for <strong>the</strong> central mechanistic steps: adaptation, aCas; process<strong>in</strong>g, pCas;<br />
and <strong>in</strong>terference, iCas, iCmr, and iCsm, which are generally genetically<br />
discrete units but are also functionally <strong>in</strong>terdependent (Figure 2). These<br />
terms are considered to provide a useful label for all components of <strong>the</strong><br />
genetically diverse archaeal functional modules. Gene cassettes of all<br />
<strong>the</strong> functional modules often carry additional prote<strong>in</strong>s that are<br />
conserved for different <strong>CRISPR</strong> subtypes, and gene cassettes for <strong>the</strong><br />
three types of <strong>in</strong>terference module are particularly diverse (Figures 2<br />
and 4). The terms are applied to components that are specifically<br />
<strong>in</strong>volved <strong>in</strong> <strong>the</strong> ma<strong>in</strong> functional steps of different <strong>CRISPR</strong> <strong>system</strong>s, but<br />
exclude transcriptional regulators (Figures 2 and 4).<br />
both <strong>the</strong>ir gene contents and <strong>in</strong> <strong>the</strong>ir comb<strong>in</strong>ations<br />
(Figure 2a–d). About half of <strong>the</strong> archaeal iCmr and iCsm<br />
gene cassettes are physically separated on genomes from<br />
<strong>CRISPR</strong> loci and aCas genes (Figure 2e,f).<br />
Adaptation<br />
New spacer uptake <strong>in</strong>volves excision of a protospacer from<br />
an <strong>in</strong>vad<strong>in</strong>g DNA genetic element and its <strong>in</strong>tegration as a<br />
new spacer at <strong>the</strong> repeat sequence adjacent to <strong>the</strong> leader,<br />
result<strong>in</strong>g <strong>in</strong> duplication of <strong>the</strong> repeat. It has only been<br />
observed under laboratory conditions for Streptococcus<br />
<strong>the</strong>rmophilus [27]. For archaea, evidence is limited to<br />
comparative genomic studies of closely related Sulfolobus<br />
stra<strong>in</strong>s where more recently <strong>in</strong>corporated spacers are clustered<br />
adjacent to <strong>the</strong> leader [16,28–30]. The short PAM<br />
motif adjacent to <strong>the</strong> protospacer (Box 1) has been implicated<br />
<strong>in</strong> determ<strong>in</strong><strong>in</strong>g <strong>the</strong> orientation of <strong>in</strong>serted spacers<br />
[16,26,31]. Most aCas modules are relatively conserved <strong>in</strong><br />
content, generally carry<strong>in</strong>g prote<strong>in</strong>s Cas1, Cas2 and Cas4<br />
(Figure 2), of which <strong>the</strong> first two appear to be essential.<br />
Moreover, spacer <strong>in</strong>tegration at <strong>the</strong> first repeat, comb<strong>in</strong>ed<br />
with phylogenetic evidence for coevolution of cas1, leader<br />
and repeat sequences, suggest that <strong>the</strong> leader is cofunctional<br />
[16,17].<br />
RNA process<strong>in</strong>g<br />
Transcripts <strong>in</strong>itiate with<strong>in</strong> leaders and term<strong>in</strong>ate downstream<br />
from <strong>CRISPR</strong> loci [16]; early work on Archaeoglobus<br />
fulgidus <strong>in</strong>dicated that process<strong>in</strong>g occurs with<strong>in</strong><br />
adjacent repeats [32]. The primary process<strong>in</strong>g enzyme is<br />
<strong>the</strong> ubiquitous and diverse Cas6 prote<strong>in</strong> and, at least <strong>in</strong><br />
Pyrococcus furiosus, <strong>the</strong> <strong>CRISPR</strong> transcript wraps around<br />
<strong>the</strong> Cas6 endonuclease and is cut once <strong>in</strong> each adjacent<br />
repeat [23,33]. The order and direction of process<strong>in</strong>g<br />
rema<strong>in</strong>s unclear. Early work on Sulfolobus solfataricus<br />
suggested that, <strong>in</strong> contrast to A. fulgidus, <strong>in</strong>itially every<br />
third repeat is cut and that process<strong>in</strong>g occurs primarily<br />
from <strong>the</strong> 3 0 end of <strong>the</strong> <strong>CRISPR</strong> transcript, but this rema<strong>in</strong>s<br />
to be confirmed [16,34]. Moreover, process<strong>in</strong>g levels were<br />
551
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
(a)<br />
S. islandicus<br />
(b)<br />
P. furiosus<br />
(c)<br />
C. subterranum<br />
(d)<br />
T. volcanium<br />
(e)<br />
H. butylicus<br />
(f)<br />
M. vulcanius<br />
17<br />
R<br />
<strong>CRISPR</strong><br />
115 93<br />
cas6 cmr2 cmr3 csx1 cmr5 cas8 cas7 cas5 cas3 cas4 cas1 cas2<br />
csx1<br />
csx1<br />
R R R R R R R<br />
csx1<br />
R<br />
cmr5<br />
cas6<br />
R<br />
csm1<br />
higher <strong>in</strong> Sulfolobus dur<strong>in</strong>g stationary phase when <strong>the</strong><br />
cells are more vulnerable to viral attack [16,28]. Patterns of<br />
archaeal crRNAs are often complex, extend<strong>in</strong>g over <strong>the</strong><br />
approximate size range 35–60 nt, and this probably reflects<br />
<strong>the</strong> diversity of <strong>CRISPR</strong> <strong>system</strong>s present [28,35]. To date,<br />
all <strong>the</strong> characterized crRNAs carry an 8 nt repeat sequence<br />
at <strong>the</strong> 5 0 end. Larger crRNAs implicated <strong>in</strong> DNA target<strong>in</strong>g<br />
<strong>in</strong> vivo are 60–65 nt <strong>in</strong> length and carry partial repeat<br />
sequences at each end, whereas smaller crRNAs which can<br />
target RNA <strong>in</strong> vitro are 37–45 nt <strong>in</strong> length and lack repeat<br />
and partial spacer sequences at <strong>the</strong> 3 0 end [20–22]. Process<strong>in</strong>g<br />
at <strong>the</strong> 3 0 end of <strong>the</strong>se RNAs is performed by an<br />
unknown enzyme [21,22]. Process<strong>in</strong>g with<strong>in</strong> repeats <strong>in</strong><br />
Streptococcus pyrogenes is effected by a trans-encoded<br />
RNA and host-encoded RNase III [36]. This type II<br />
<strong>CRISPR</strong>/Cas <strong>system</strong> does not occur <strong>in</strong> archaea [15] where<br />
<strong>the</strong> cellular functions of RNase III appear to be performed<br />
by a general <strong>in</strong>tron-splic<strong>in</strong>g enzyme with a different substrate<br />
specificity [37].<br />
In most studies archaeal <strong>CRISPR</strong> loci are constitutively<br />
expressed and processed <strong>in</strong>to mature crRNAs <strong>in</strong> <strong>the</strong> absence<br />
of <strong>in</strong>vad<strong>in</strong>g DNA elements, but it rema<strong>in</strong>s unclear<br />
whe<strong>the</strong>r <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s require fur<strong>the</strong>r activation<br />
[16,21]. Bacterial studies have revealed diverse <strong>CRISPR</strong><br />
regulatory mechanisms which can be activated on viral<br />
<strong>in</strong>fection produc<strong>in</strong>g elevated expression [38,39].<br />
DNA <strong>in</strong>terference<br />
Independent l<strong>in</strong>es of evidence support that DNA is <strong>the</strong><br />
primary target for most <strong>CRISPR</strong> <strong>system</strong>s. Putative protospacer<br />
sequences are essentially distributed randomly on<br />
R<br />
csm1<br />
csm2<br />
R<br />
R R R<br />
R R<br />
csx1 csm1 csm2<br />
csx1<br />
cas2 t.r. t.r.<br />
csa1 cas1 cas4<br />
<strong>CRISPR</strong><br />
csa5 cas7 cas5 cas3'<br />
R<br />
cmr2<br />
R<br />
csx1 cas1cas2<br />
cmr3<br />
19<br />
42<br />
cas2<br />
t.r.<br />
cas4 cas1 cas6<br />
18<br />
Key:<br />
9<br />
archaeal virus and plasmid DNA with no significant bias of<br />
match<strong>in</strong>g crRNAs to ei<strong>the</strong>r genes relative to <strong>in</strong>tergenic<br />
regions or to cod<strong>in</strong>g versus non-cod<strong>in</strong>g strands [26,28].<br />
Moreover, genetic studies on different Sulfolobus species<br />
have provided strong evidence for DNA target<strong>in</strong>g <strong>in</strong> vivo,<br />
presumably <strong>in</strong>volv<strong>in</strong>g iCas ra<strong>the</strong>r than iCmr modules<br />
[35,40]. For bacteria, experimental evidence for DNA target<strong>in</strong>g<br />
<strong>in</strong> vivo was provided for <strong>the</strong> <strong>CRISPR</strong>/Csm <strong>system</strong> of<br />
Staphylococcus epidermidis [41] (equivalent to Figure 2d),<br />
<strong>the</strong> <strong>CRISPR</strong>/Csn (bacterial type II) <strong>system</strong> of S. <strong>the</strong>rmophilus<br />
[25] and for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> of Escherichia<br />
coli [20], although none of <strong>the</strong>se studies precluded additional<br />
RNA target<strong>in</strong>g.<br />
A large prote<strong>in</strong> complex, conta<strong>in</strong><strong>in</strong>g multiple prote<strong>in</strong><br />
components, was first characterized for an E. coli <strong>CRISPR</strong>/<br />
Cas <strong>system</strong> that participates <strong>in</strong> crRNA maturation and<br />
that facilitates anneal<strong>in</strong>g of <strong>the</strong> crRNA to <strong>the</strong> DNA target,<br />
but not cleavage [41]. It generates a seahorse form and is<br />
def<strong>in</strong>ed as a CASCADE complex (Box 1). A related structure<br />
is produced for a S. solfataricus <strong>CRISPR</strong>/Cas <strong>system</strong><br />
made up of only Cas5e and multiple copies of Cas7, and<br />
which appears to be <strong>in</strong>volved primarily <strong>in</strong> DNA target<strong>in</strong>g<br />
[42]. This is <strong>the</strong>refore likely to be a universal structure, at<br />
least for iCas target<strong>in</strong>g <strong>system</strong>s.<br />
Studies on S. <strong>the</strong>rmophilus demonstrated that effective<br />
<strong>in</strong>terference requires perfect matches between crRNA and<br />
protospacers [19]. However, recent work on Sulfolobus species<br />
has demonstrated that three or more mismatches located<br />
near <strong>the</strong> centre of <strong>the</strong> protospacer or at <strong>the</strong> distal end from<br />
<strong>the</strong> PAM motif do not prevent <strong>in</strong>terference [35,40]. Moreover,<br />
a <strong>system</strong>atic study of <strong>the</strong> E. coli <strong>CRISPR</strong>/Cas <strong>system</strong><br />
22<br />
csa5 cas7<br />
aCas<br />
iCas<br />
pCas<br />
iCsm or iCmr<br />
cas3" csaX cas6<br />
csaXa<br />
cas5 csaXb cas3' cas3"<br />
<strong>CRISPR</strong> / no. of repeats<br />
TRENDS <strong>in</strong> Microbiology<br />
Figure 2. Representative gene maps of six ma<strong>in</strong> classes of archaeal <strong>CRISPR</strong> <strong>system</strong>s. (a) <strong>CRISPR</strong>/aCas-pCas-iCas, common <strong>in</strong> archaea; <strong>in</strong> this example those of S. islandicus<br />
are shown. (b) <strong>CRISPR</strong>/aCas-pCas-iCas-iCmr; studied experimentally <strong>in</strong> P. furiosus. (c) <strong>CRISPR</strong>/aCas-pCas-iCas-iCsm; from Caldiarchaeum subterranum. (d) <strong>CRISPR</strong>/aCasiCsm;<br />
shown for Thermoplasma volcanium. (e) iCmr from Hyper<strong>the</strong>rmus butylicus. (f) iCsm from Methanocaldococcus vulcanius. Genes encod<strong>in</strong>g <strong>the</strong> functional doma<strong>in</strong>s<br />
are color-coded: aCas module, light blue; pCas gene, orange; iCas module, yellow; iCsm and iCmr modules, red. t.r. genes <strong>in</strong> green encode putative transcriptional regulator<br />
genes that are not considered to be part of <strong>the</strong> functional modules. R <strong>in</strong>dicates prote<strong>in</strong>s carry<strong>in</strong>g RNA-recognition motifs (RAMPs). (a) belongs to <strong>the</strong> type I <strong>CRISPR</strong> <strong>system</strong>;<br />
(b) and (c) are mixtures of type I and type III, whereas (d–f) are classified as type III [15].<br />
552
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
has shown that only six of <strong>the</strong> seven nucleotides of <strong>the</strong><br />
targeted protospacer strand proximal to <strong>the</strong> PAM motif<br />
must match <strong>the</strong> crRNA perfectly, and this was proposed<br />
to act as a recognition site, or seed, for <strong>the</strong> <strong>in</strong>terference<br />
reaction [43]. Whe<strong>the</strong>r this is a general property of <strong>the</strong><br />
<strong>CRISPR</strong> DNA target<strong>in</strong>g <strong>system</strong>s rema<strong>in</strong>s to be determ<strong>in</strong>ed.<br />
RNA <strong>in</strong>terference<br />
In <strong>the</strong> <strong>CRISPR</strong>/Cmr <strong>system</strong> of P. furiosus (Figure 2b), a<br />
complex of Cmr prote<strong>in</strong>s encompass<strong>in</strong>g a small crRNA,<br />
lack<strong>in</strong>g <strong>the</strong> 3 0 end of <strong>the</strong> spacer sequence, targets and<br />
cleaves complementary s<strong>in</strong>gle-stranded RNA (ssRNA) <strong>in</strong><br />
vitro [24]. To date <strong>the</strong>re is no evidence for or aga<strong>in</strong>st <strong>in</strong> vivo<br />
RNA target<strong>in</strong>g, and it is too early to establish whe<strong>the</strong>r<br />
mRNAs, non-cod<strong>in</strong>g RNAs (ncRNAs), and/or RNA viruses<br />
can be targets. Never<strong>the</strong>less, iCmr modules are common <strong>in</strong><br />
archaea and are encoded ei<strong>the</strong>r toge<strong>the</strong>r with aCas modules<br />
and <strong>CRISPR</strong> loci or as separate genetic entities<br />
(Figure 2c,e). Paradoxically, some Cmr prote<strong>in</strong>s show significant<br />
sequence similarity to Csm prote<strong>in</strong>s implicated <strong>in</strong><br />
DNA target<strong>in</strong>g <strong>in</strong> S. epidermidis [41], and both are common<br />
<strong>in</strong> archaea. A phylogenetic tree of archaeal Cmr2 (Cas10)<br />
homologs shows five ma<strong>in</strong> subfamilies, four of which represent<br />
iCmr and iCsm modules (Figure 3) [12,44]. The fifth<br />
subfamily, A (represented by Csx11), is present <strong>in</strong> a few<br />
bacteria and methanoarchaea but has not been studied<br />
experimentally, and is <strong>the</strong>refore not considered fur<strong>the</strong>r.<br />
O<strong>the</strong>r components of iCmr and iCsm modules <strong>in</strong>clude <strong>the</strong><br />
small conserved Cmr5/Csm2 prote<strong>in</strong> and three to seven<br />
copies of highly diverse RNA b<strong>in</strong>d<strong>in</strong>g motif-conta<strong>in</strong><strong>in</strong>g<br />
prote<strong>in</strong>s (RAMP prote<strong>in</strong>s denoted R <strong>in</strong> Figure 2)<br />
[12,14,44]. In summary, <strong>the</strong> degree of cofunctionality of<br />
Euryarchaea<br />
(c)<br />
<strong>Archaea</strong>specific<br />
Cmr2<br />
(d)<br />
Crenarchaea<br />
bias<br />
(b)<br />
Csm1<br />
(e)<br />
10%<br />
(a)<br />
Csx11<br />
Euryarchaea<br />
Euryarchaea<br />
bias<br />
TRENDS <strong>in</strong> Microbiology<br />
Figure 3. Phylogenetic tree of <strong>the</strong> archaeal Cas10 subtypes Cmr2, Csm1 and Csx11.<br />
These are <strong>the</strong> largest and most conserved sub components of <strong>the</strong> <strong>in</strong>terference<br />
modules of type III <strong>CRISPR</strong> <strong>system</strong>s, where <strong>the</strong> iCmr module has been implicated<br />
<strong>in</strong> RNA target<strong>in</strong>g [23] and <strong>the</strong> iCsm <strong>system</strong> <strong>in</strong> DNA target<strong>in</strong>g [41]. The deep<br />
branch<strong>in</strong>g reflects <strong>the</strong> very divergent sequences. Analysis of <strong>the</strong> five subfamilies<br />
A–E <strong>in</strong>dicates strong biases <strong>in</strong> <strong>the</strong>ir distributions among crenarchaea and<br />
euryarchaea, and family D is archaea-specific and is present <strong>in</strong> crenarchaea,<br />
euryarchaea and unclassified archaea. The Figure is reproduced with permission<br />
from [44]. 10% <strong>in</strong>dicates <strong>the</strong> amount of am<strong>in</strong>o acid sequence change for <strong>the</strong> given<br />
length on <strong>the</strong> tree branches.<br />
<strong>the</strong> partly homologous iCmr and iCsm modules rema<strong>in</strong>s<br />
unclear.<br />
Module exchange<br />
Attempts to classify archaeal <strong>CRISPR</strong>/Cas <strong>system</strong>s of <strong>the</strong><br />
Sulfolobales on <strong>the</strong> basis of <strong>the</strong> cas1, leader and repeat<br />
sequences provided evidence for four families that were<br />
conserved <strong>in</strong> gene content and synteny and <strong>the</strong>y appeared<br />
to constitute <strong>in</strong>tegral genetic units [16,26]. However, more<br />
detailed phylogenetic analysis of <strong>the</strong> aCas and iCas genes<br />
of family I <strong>CRISPR</strong>/Cas <strong>system</strong>s of different Sulfolobus<br />
islandicus stra<strong>in</strong>s (Figure 2a) revealed that <strong>the</strong> aCas tree<br />
diverges from <strong>the</strong> iCas tree as well as from trees generated<br />
from all <strong>the</strong> concatenated genes of each host genome,<br />
consistent with exchange of aCas modules hav<strong>in</strong>g occurred<br />
[17]. The results of this analysis are illustrated <strong>in</strong><br />
Figure 4a for two divergent pairs of <strong>CRISPR</strong>/Cas <strong>system</strong>s<br />
from four selected S. islandicus stra<strong>in</strong>s [17]. For each<br />
similar pair <strong>the</strong> concatenated homologous Cas prote<strong>in</strong>s<br />
showed about 99% am<strong>in</strong>o acid sequence identity. However,<br />
when <strong>the</strong> prote<strong>in</strong> sequences of <strong>the</strong> two pairs were compared<br />
whereas <strong>the</strong> iCas modules ma<strong>in</strong>ta<strong>in</strong>ed <strong>the</strong>ir high sequence<br />
identity (99%), <strong>the</strong> aCas identity was reduced to 74%<br />
(Figure 4a), consistent with <strong>the</strong> aCas module hav<strong>in</strong>g been<br />
exchanged [17]. Us<strong>in</strong>g <strong>the</strong> same approach, similar<br />
<strong>CRISPR</strong>/Cas <strong>system</strong>s of two divergent pairs of stra<strong>in</strong>s of<br />
<strong>the</strong> <strong>the</strong>rmoneutrophile Pyrobaculum were compared. All<br />
<strong>the</strong> concatenated homologous Cas prote<strong>in</strong> components of<br />
each similar pair showed 70% am<strong>in</strong>o acid sequence identity.<br />
However, when <strong>the</strong> two pairs were compared aCas<br />
(a) aCas exchange (S. islandicus)<br />
Group 1<br />
vs<br />
Group 2<br />
(b)<br />
Group 1<br />
vs<br />
Group 2<br />
cas2 t.r.<br />
csa1 cas1 cas4<br />
74%<br />
70%<br />
C<br />
C<br />
90%<br />
aCas iCas<br />
iCas exchange (Pyrobaculum sp.)<br />
cas2<br />
cas4 cas1 csa1<br />
t.r.<br />
csa5 cas7 cas5 cas3'<br />
csa5<br />
t.r. cas7 cas5 n.d.<br />
n.d. cas7 cas5<br />
28%<br />
cas3" csaX cas6<br />
cas3' cas3"<br />
cas3' cas3" n.d.<br />
TRENDS <strong>in</strong> Microbiology<br />
Figure 4. Examples of genetic exchange of functional modules where am<strong>in</strong>o acid<br />
sequences from shared genes <strong>in</strong> each functional module are compared [17]. (a)<br />
Comparison of <strong>the</strong> aCas and iCas modules for type I <strong>CRISPR</strong>/Cas <strong>system</strong>s of four<br />
closely related S. islandicus stra<strong>in</strong>s. Pairwise <strong>the</strong>y show a high sequence identity of<br />
99% for two modules, but when <strong>the</strong> two pairs are compared <strong>the</strong> comb<strong>in</strong>ed iCas<br />
prote<strong>in</strong>s rema<strong>in</strong> almost identical <strong>in</strong> sequence, whereas <strong>the</strong> aCas modules show<br />
only 74% sequence similarity between <strong>the</strong> pairs, consistent with <strong>the</strong> aCas module<br />
hav<strong>in</strong>g been exchanged for one of <strong>the</strong> group of stra<strong>in</strong>s [17]. (b) A similar study was<br />
performed for shared genes of four <strong>the</strong>rmoneutrophilic Pyrobaculum stra<strong>in</strong>s,<br />
where two pairs each show similar levels of am<strong>in</strong>o acid sequence similarity for<br />
<strong>the</strong>ir aCas and iCas modules (about 70%), but when <strong>the</strong> two pairs are compared <strong>the</strong><br />
aCas sequences rema<strong>in</strong> constant at about 70% whereas <strong>the</strong> iCas module yields<br />
only 28% similarity – <strong>in</strong>dicative of <strong>the</strong> iCas modules hav<strong>in</strong>g been exchanged. Gene<br />
contents of <strong>the</strong> two pairs of iCas modules also <strong>in</strong>dicate that <strong>the</strong>y belong to different<br />
subtypes. Gene modules are color-coded as <strong>in</strong> Figure 2. Abbreviations: C, <strong>CRISPR</strong><br />
locus; t.r., transcriptional regulator (<strong>in</strong> green); n.d., gene identity not determ<strong>in</strong>ed.<br />
553
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
prote<strong>in</strong> sequence identity rema<strong>in</strong>ed at 70%, but a much<br />
lower value of 28% was observed for <strong>the</strong> iCas prote<strong>in</strong>s,<br />
<strong>in</strong>dicative of exchange of <strong>the</strong> latter (Figure 4b).<br />
Constra<strong>in</strong>ts on modular exchange<br />
Specific <strong>in</strong>teractions with <strong>the</strong> repeat sequence, ei<strong>the</strong>r at<br />
<strong>the</strong> DNA or RNA level, are crucial for <strong>the</strong> function of <strong>the</strong><br />
aCas, pCas and <strong>in</strong>terference modules, and <strong>the</strong> capacity of<br />
some prote<strong>in</strong> components to <strong>in</strong>teract specifically with <strong>the</strong><br />
repeat sequence might be a major constra<strong>in</strong>t on modular<br />
exchange. Integration of new spacers thus probably<br />
depends on Cas prote<strong>in</strong> recognition of <strong>the</strong> first repeat<br />
and adjo<strong>in</strong><strong>in</strong>g leader region [16,19]. Cas6 associates specifically<br />
with, and cleaves, <strong>the</strong> repeat dur<strong>in</strong>g process<strong>in</strong>g<br />
[20,22] and is sometimes cofunctional with different <strong>in</strong>terference<br />
modules. The iCas complex recognizes repeat sequence<br />
elements at <strong>the</strong> ends of crRNA for DNA target<strong>in</strong>g,<br />
and <strong>the</strong> iCmr complex b<strong>in</strong>ds to <strong>the</strong> repeat sequence at <strong>the</strong> 5 0<br />
end of crRNAs target<strong>in</strong>g RNA [23,45,42]. The small PAM<br />
motif also differs <strong>in</strong> sequence for different <strong>CRISPR</strong>/Cas<br />
<strong>system</strong>s, and <strong>the</strong> motif is likely to be important for protospacer<br />
selection, for determ<strong>in</strong><strong>in</strong>g its orientation on <strong>in</strong>sertion<br />
<strong>in</strong> <strong>CRISPR</strong> loci [16,19,31] and, at some level, to be<br />
important for DNA target<strong>in</strong>g [19,35,46]. Moreover, <strong>the</strong><br />
length of <strong>the</strong> crRNA spacer sequence may <strong>in</strong>fluence <strong>the</strong><br />
target<strong>in</strong>g and cleavage by <strong>the</strong> iCas module [42]. Taken<br />
toge<strong>the</strong>r, <strong>the</strong>re appear to be multiple sequence and structural<br />
constra<strong>in</strong>ts on modular exchange that are likely to be<br />
offset partly by <strong>the</strong> relatively conserved sequence at <strong>the</strong><br />
leader-distal end of repeats. In support, putative examples<br />
of modular exchange, <strong>in</strong>clud<strong>in</strong>g those shown <strong>in</strong> Figure 4,<br />
exhibit fairly conserved repeat sequences, spacer sizes and<br />
predicted PAM motifs. These examples also show that on<br />
modular exchange <strong>the</strong> repeat <strong>in</strong>variably follows <strong>the</strong> aCas<br />
and not <strong>the</strong> iCas modules [17].<br />
Natural dynamics of <strong>CRISPR</strong> loci<br />
Changes can occur <strong>in</strong> <strong>CRISPR</strong> loci by a variety of mechanisms<br />
without compromis<strong>in</strong>g <strong>the</strong>ir overall viability. New<br />
spacer-repeat units are added, <strong>in</strong>termittently, at or near<br />
<strong>the</strong> repeat adjacent to <strong>the</strong> leader [16,19,28,29]. Moreover,<br />
comparative analyses of closely related archaeal species<br />
support: (i) <strong>the</strong> occurrence of large <strong>in</strong>dels, generally deletions;<br />
(ii) duplication of sets of spacer-repeat units, and (iii)<br />
<strong>in</strong>tracellular exchange of spacer-repeat units between<br />
<strong>CRISPR</strong> loci [12,16,28]. Changes can also be <strong>in</strong>duced <strong>in</strong><br />
<strong>CRISPR</strong> loci by <strong>in</strong>vad<strong>in</strong>g genetic elements carry<strong>in</strong>g, for<br />
example, essential metabolic genes or, possibly, tox<strong>in</strong>–<br />
antitox<strong>in</strong> ma<strong>in</strong>tenance <strong>system</strong>s [35,47]. Such changes were<br />
demonstrated by challeng<strong>in</strong>g <strong>CRISPR</strong> loci of different<br />
Sulfolobus species with plasmids carry<strong>in</strong>g match<strong>in</strong>g protospacers<br />
and appropriate PAM motifs ma<strong>in</strong>ta<strong>in</strong>ed under<br />
selection [35]. This resulted <strong>in</strong> loss of ei<strong>the</strong>r <strong>CRISPR</strong><br />
regions conta<strong>in</strong><strong>in</strong>g match<strong>in</strong>g spacers or complete<br />
<strong>CRISPR</strong>/Cas <strong>system</strong>s. In S. islandicus, 50% of viable<br />
transformants had specifically lost <strong>the</strong> match<strong>in</strong>g spacerrepeat<br />
unit, suggest<strong>in</strong>g that feedback and <strong>in</strong>terference of<br />
match<strong>in</strong>g spacers might occur rarely, followed by recomb<strong>in</strong>ational<br />
repair via adjacent repeats or by slippage occurr<strong>in</strong>g<br />
dur<strong>in</strong>g DNA replication [35]. Fur<strong>the</strong>rmore, some<br />
challenged spacers of S. solfataricus were <strong>in</strong>activated by<br />
554<br />
<strong>the</strong> direct <strong>in</strong>sertion of <strong>in</strong>sertion sequence (IS) elements<br />
[35]. Bio<strong>in</strong>formatic analyses have also provided support for<br />
spacers be<strong>in</strong>g <strong>in</strong>activated by mutation of <strong>the</strong> border<strong>in</strong>g<br />
repeats, and this could generate defective crRNAs [48].<br />
Thus <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong> loci can be compromised by<br />
many different mechanisms.<br />
Anti-<strong>CRISPR</strong> mechanisms and defective <strong>CRISPR</strong>/Cas<br />
and Cmr modules<br />
Specific ways <strong>in</strong> which archaeal viruses and plasmids<br />
might circumvent <strong>CRISPR</strong> <strong>system</strong>s rema<strong>in</strong> speculative,<br />
<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> observation that genomes of crenarchaeal<br />
rudiviruses and lipothrixviruses accrue 12 bp <strong>in</strong>dels, probably<br />
deletions, when passed through different hosts [49].<br />
However, given <strong>the</strong> complexity of many archaeal <strong>CRISPR</strong><br />
<strong>system</strong>s, <strong>the</strong>y are also vulnerable to mutation, rearrangements<br />
or transposition events [30,50]. The multiple transcriptional<br />
regulators present <strong>in</strong> many archaeal <strong>CRISPR</strong><br />
<strong>system</strong>s (Figures 2 and 4) are obvious targets. For example,<br />
<strong>in</strong> an S. islandicus stra<strong>in</strong> <strong>the</strong> putative provirus M164 is<br />
<strong>in</strong>tegrated <strong>in</strong>to <strong>the</strong> gene for Csa3, <strong>the</strong> putative transcriptional<br />
regulator of <strong>the</strong> aCas gene cassette, but apparently<br />
leaves <strong>the</strong> pCas and iCas modules unaffected [12,17].<br />
Moreover, bacteriophage EPV1 characterized <strong>in</strong> a metagenomics<br />
study encodes <strong>the</strong> proteobacterial transcriptional<br />
repressor H-NS [51] that can <strong>in</strong>activate <strong>the</strong> entire E. coli<br />
<strong>CRISPR</strong> <strong>system</strong> [38]. Many archaeal <strong>system</strong>s lack core<br />
genes, and <strong>CRISPR</strong> loci sometimes lack leaders<br />
[16,30,50]. It rema<strong>in</strong>s unclear whe<strong>the</strong>r <strong>the</strong>se defective<br />
modules can be complemented by Cas prote<strong>in</strong>s of ano<strong>the</strong>r<br />
module of a similar type with<strong>in</strong> a given organism. S.<br />
solfataricus stra<strong>in</strong>s P1 and P2 carry a <strong>CRISPR</strong> locus E<br />
lack<strong>in</strong>g an aCas module that is not complemented by aCas<br />
prote<strong>in</strong>s associated with <strong>the</strong> phylogenetically similar<br />
<strong>CRISPR</strong> loci C and D, but this could reflect sequence<br />
differences <strong>in</strong> <strong>the</strong> leaders [16]. There might also be advantages,<br />
at least temporarily, <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g cofunctional<br />
process<strong>in</strong>g and <strong>in</strong>terference modules despite defective adaptation<br />
[26,28].<br />
Genomic mobility<br />
Comparative studies of <strong>the</strong> Sulfolobales <strong>in</strong>dicated that<br />
<strong>CRISPR</strong> <strong>system</strong>s are <strong>in</strong>variably located <strong>in</strong> genomic regions<br />
variable <strong>in</strong> gene content and often rich <strong>in</strong> transposable<br />
elements [44,52]. Fur<strong>the</strong>rmore, n<strong>in</strong>e genomes of closely<br />
related S. islandicus stra<strong>in</strong>s from different geographical<br />
locations carried two to four apparently viable comb<strong>in</strong>ations<br />
of different subfamilies of both <strong>CRISPR</strong>/Cas, and<br />
<strong>in</strong>dependent iCmr and iCsm modules, <strong>in</strong>dicative of <strong>the</strong>ir<br />
hav<strong>in</strong>g been transferred between stra<strong>in</strong>s [44]. Strong evidence<br />
for specific <strong>in</strong>tergenomic transfer of <strong>CRISPR</strong> loci<br />
carried on larger chromosomal fragments is available for<br />
Pyrococcus and Sulfolobus stra<strong>in</strong>s [12,53] and for lactic<br />
acid bacteria [50]. Whe<strong>the</strong>r such exchange is common for<br />
all archaea rema<strong>in</strong>s unclear because for stra<strong>in</strong>s of S.<br />
solfataricus, more distantly related than those of S. islandicus,<br />
<strong>CRISPR</strong>/Cas <strong>system</strong>s have been largely reta<strong>in</strong>ed<br />
and share many identical spacer sequences [12,16].<br />
Given <strong>the</strong> potential for mobility of <strong>CRISPR</strong> <strong>system</strong>s, it<br />
was speculated that tox<strong>in</strong>–antitox<strong>in</strong> <strong>system</strong>s, encoded near<br />
<strong>CRISPR</strong> loci, could help to stabilize <strong>the</strong> <strong>CRISPR</strong> genetic
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
vapBC vapBC<br />
53 13 9<br />
<strong>system</strong>s with<strong>in</strong> chromosomes [52]. An extreme example of<br />
this occurs <strong>in</strong> Acidianus hospitalis, a slowly grow<strong>in</strong>g organism<br />
carry<strong>in</strong>g 26 vapBC antitox<strong>in</strong>–tox<strong>in</strong> gene pairs, four<br />
of which are <strong>in</strong>terwoven with <strong>the</strong> <strong>CRISPR</strong>/Cas/Csm <strong>system</strong><br />
(Figure 5) and <strong>the</strong> fifth is associated with a separate<br />
<strong>CRISPR</strong>/Cas <strong>system</strong> [52]. The absence of any encoded<br />
VapB or VapC prote<strong>in</strong>s with similar sequences <strong>in</strong> this<br />
organism is essential for <strong>the</strong> proposed capacity to ma<strong>in</strong>ta<strong>in</strong><br />
a <strong>CRISPR</strong>/Cas <strong>system</strong> when loss of <strong>the</strong> DNA region could<br />
lead to VapC-<strong>in</strong>duced cell death [52].<br />
Interdoma<strong>in</strong> mobility<br />
Genetic exchange between archaea and bacteria is restricted<br />
by many factors, <strong>in</strong>clud<strong>in</strong>g basic <strong>in</strong>compatibility of <strong>the</strong>ir<br />
virus–host <strong>in</strong>teractions and radically different conjugative<br />
mechanisms [4,6,54]. Moreover, even after successful DNA<br />
exchange, basic differences <strong>in</strong> <strong>the</strong> mechanisms of transcriptional<br />
<strong>in</strong>itiation and term<strong>in</strong>ation, and of translational<br />
<strong>in</strong>itiation, would present formidable barriers to viable gene<br />
expression [55,56]. Fur<strong>the</strong>rmore, as argued above, many<br />
archaea have adapted to extreme low-energy environments<br />
where levels of bacterial cells are low or nonexistent.<br />
In an attempt to <strong>in</strong>terpret <strong>the</strong> extent to which <strong>in</strong>terdoma<strong>in</strong><br />
exchange has <strong>in</strong>fluenced <strong>the</strong> evolution of archaeal <strong>CRISPR</strong><br />
<strong>system</strong>s, Markov cluster<strong>in</strong>g algorithm (MCL) techniques<br />
based on Cas1 sequences were used to compare phylogenetically<br />
<strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>s of archaea and bacteria.<br />
The results support <strong>the</strong> absence of type II <strong>CRISPR</strong>/Cas<br />
<strong>system</strong>s <strong>in</strong> archaea and revealed clusters specific to, or<br />
strongly biased to, archaea or bacteria, with one cluster<br />
carry<strong>in</strong>g multiple archaeal, predom<strong>in</strong>antly methanoarchaeal,<br />
and bacterial members [17,18,57]. Qualitatively, <strong>the</strong><br />
analysis suggests that <strong>in</strong>terdoma<strong>in</strong> exchange of aCas modules<br />
occurs rarely and <strong>the</strong>n predom<strong>in</strong>antly <strong>in</strong> environments<br />
where archaea and bacteria are both abundant.<br />
There is limited evidence for homologous <strong>CRISPR</strong>-like<br />
mechanisms operat<strong>in</strong>g <strong>in</strong> eukaryotes. The RNA-target<strong>in</strong>g<br />
<strong>CRISPR</strong>/Cmr <strong>system</strong> shows some mechanistic similarity to<br />
RNAi, <strong>the</strong> viral RNA <strong>in</strong>terference <strong>system</strong> of eukarya<br />
[14,23]. Moreover, DNA-target<strong>in</strong>g <strong>CRISPR</strong>/Cas <strong>system</strong>s,<br />
<strong>in</strong> general, share features of <strong>the</strong> Piwi/Argonaute-<strong>in</strong>teract<strong>in</strong>g<br />
(piRNA) <strong>system</strong> where RNA-encod<strong>in</strong>g DNA accumulates<br />
passively <strong>in</strong> a small number of chromosomal loci.<br />
Transcripts from <strong>the</strong>se loci are processed <strong>in</strong>to small<br />
ssRNAs that complex with Piwi/Argonaute prote<strong>in</strong>s and<br />
can <strong>in</strong>hibit DNA transposition activity [10,16,31]. Although<br />
early <strong>in</strong> evolution <strong>the</strong>re may have been limited<br />
coevolution of <strong>the</strong>se <strong>in</strong>terference <strong>system</strong>s for all three<br />
doma<strong>in</strong>s, <strong>the</strong> archaeal and bacterial <strong>system</strong>s have clearly<br />
coevolved and <strong>in</strong>terchanged to a significant degree, with<br />
<strong>the</strong> exception of <strong>the</strong> type II <strong>CRISPR</strong>/Cas <strong>system</strong> dependent<br />
on <strong>the</strong> bacteria-specific RNase III enzyme for process<strong>in</strong>g<br />
[36].<br />
cas4 csx1 vapBC iCsm<br />
csa1 vapBC cas2 cas1 cas6<br />
TRENDS <strong>in</strong> Microbiology<br />
Figure 5. A type III <strong>CRISPR</strong> <strong>system</strong> of <strong>the</strong> acido<strong>the</strong>rmophile A. hospitalis carry<strong>in</strong>g four <strong>in</strong>terwoven antitox<strong>in</strong>–tox<strong>in</strong> vapBC gene pairs that are highly divergent <strong>in</strong> sequence<br />
[52]. Functional module genes are color-coded as <strong>in</strong> Figure 2, and <strong>in</strong>clude genes of unknown function (grey). Numbers of repeats are <strong>in</strong>dicated for each <strong>CRISPR</strong> locus.<br />
Conclud<strong>in</strong>g remarks<br />
One of <strong>the</strong> puzzles concern<strong>in</strong>g archaeal <strong>CRISPR</strong> <strong>system</strong>s is<br />
why <strong>the</strong>y are so diverse and complex. There are often<br />
multiple <strong>CRISPR</strong> loci with<strong>in</strong> a given archaeon carry<strong>in</strong>g<br />
hundreds of unique spacer sequences with multiple significant<br />
spacer matches to a given type of virus or conjugative<br />
plasmid [26,28,30,44]. Possibly <strong>the</strong> diversity and complexity<br />
reflects <strong>the</strong> large variety of different virus families<br />
characterized for extreme <strong>the</strong>rmophiles, and to a lesser<br />
extent haloarchaea [4–6]. Ano<strong>the</strong>r possibility is that, given<br />
<strong>the</strong>ir modular structures, and <strong>the</strong> diversity of <strong>the</strong>ir putative<br />
transcriptional regulators, <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s may<br />
not necessarily elim<strong>in</strong>ate genetic elements. For example,<br />
<strong>the</strong> <strong>immune</strong> <strong>system</strong>s might only be activated when replication<br />
or transcription of genetic elements reaches a certa<strong>in</strong><br />
level, consistent with many viruses be<strong>in</strong>g stably<br />
ma<strong>in</strong>ta<strong>in</strong>ed at low copy-numbers with<strong>in</strong> cells [4–6].<br />
In addition to determ<strong>in</strong><strong>in</strong>g <strong>the</strong> detailed mechanistic roles<br />
of most of <strong>the</strong> core prote<strong>in</strong>s, many uncharacterized <strong>CRISPR</strong>related<br />
prote<strong>in</strong>s rema<strong>in</strong>, some which are archaea-specific<br />
and that are commonly associated with <strong>in</strong>terference modules<br />
(Figure 4a), and <strong>the</strong>se might generate diversity <strong>in</strong><br />
target<strong>in</strong>g or cleavage mechanisms. Some unclassified<br />
<strong>CRISPR</strong>-related prote<strong>in</strong>s are likely to have secondary roles,<br />
as suggested for <strong>the</strong> antitox<strong>in</strong>–tox<strong>in</strong> <strong>system</strong> of A. hospitalis<br />
help<strong>in</strong>g to stabilize <strong>CRISPR</strong>/Cas <strong>system</strong>s on chromosomes<br />
[52]. Function(s) of <strong>the</strong> iCmr and iCsm modules need to be<br />
exam<strong>in</strong>ed more extensively <strong>in</strong> vivo to establish whe<strong>the</strong>r<br />
RNA viruses and/or transcripts of DNA viruses are targeted.<br />
Target<strong>in</strong>g of transcripts could be a means of regulat<strong>in</strong>g and<br />
stabiliz<strong>in</strong>g DNA viruses <strong>in</strong> vivo. At least for Sulfolobus<br />
species, robust genetic <strong>system</strong>s are now available to resolve<br />
<strong>the</strong>se questions [35,40]. Questions rema<strong>in</strong> as to whe<strong>the</strong>r<br />
crRNAs are selected for DNA or RNA target<strong>in</strong>g or whe<strong>the</strong>r<br />
any spacer RNA potentially can be used for ei<strong>the</strong>r <strong>system</strong>,<br />
and to what extent Cas6 prote<strong>in</strong>s are <strong>in</strong>terchangeable between<br />
<strong>the</strong> different <strong>in</strong>terference <strong>system</strong>s with<strong>in</strong> a given<br />
organism [42]. Ano<strong>the</strong>r press<strong>in</strong>g question is <strong>the</strong> extent to<br />
which defective functional modules are complemented by<br />
components of o<strong>the</strong>r <strong>CRISPR</strong> <strong>system</strong>s; a high priority will be<br />
to experimentally test a broad range of phylogenetically<br />
diverse <strong>CRISPR</strong> <strong>system</strong>s to establish <strong>the</strong> extent of <strong>the</strong>ir<br />
structural and functional diversity.<br />
Acknowledgments<br />
We thank Luciano Marraff<strong>in</strong>i, Mark Young, Qunx<strong>in</strong> She and Malcolm<br />
White for helpful discussions and <strong>the</strong> referees for <strong>the</strong>ir constructive <strong>in</strong>put.<br />
Research was supported by <strong>the</strong> Danish Natural Science Research<br />
Council.<br />
References<br />
1 Gribaldo, S. et al. (2010) The orig<strong>in</strong> of eukaryotes and <strong>the</strong>ir relationship<br />
with <strong>the</strong> <strong>Archaea</strong>: are we at a phylogenomic impasse? Nat. Rev.<br />
Microbiol. 8, 743–752<br />
555
Review Trends <strong>in</strong> Microbiology November 2011, Vol. 19, No. 11<br />
2 Kurland, C.G. et al. (2006) Genomics and <strong>the</strong> irreducible nature of<br />
eukaryotic cells. Science 312, 1011–1014<br />
3 Valent<strong>in</strong>e, D.L. (2007) Adaptations to energy stress dictate <strong>the</strong> ecology<br />
and evolution of archaea. Nat. Rev. Microbiol. 5, 316–323<br />
4 Prangishvili, D. et al. (2006) Viruses of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view.<br />
Nat. Rev. Microbiol. 11, 837–848<br />
5 Porter, K. et al. (2007) Virus–host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr. Op<strong>in</strong>.<br />
Microbiol. 10, 418–424<br />
6 Lawrence, C.M. et al. (2009) Structural and functional studies of<br />
archaeal viruses. J. Biol. Chem. 284, 12599–12603<br />
7 Snyder, J.C. et al. (2010) Use of cellular <strong>CRISPR</strong> (clusters of regularly<br />
<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats) spacer-based microarrays for<br />
detection of viruses <strong>in</strong> environmental samples. Appl. Environ.<br />
Microbiol. 76, 7251–7258<br />
8 Brumfield, S.K. et al. (2009) Particle assembly and ultrastructural<br />
features associated with replication of <strong>the</strong> lytic archaeal virus<br />
Sulfolobus turreted icosahedral virus. J. Virol. 83, 5964–5970<br />
9 Bize, A. et al. (2009) A unique virus release mechanism <strong>in</strong> <strong>the</strong> archaea.<br />
Proc. Natl. Acad. Sci. U.S.A. 106, 11306–11311<br />
10 Karg<strong>in</strong>ov, F.V. and Hannon, G.J. (2010) The <strong>CRISPR</strong> <strong>system</strong>: small<br />
RNA-guided defense <strong>in</strong> bacteria and archaea. Mol. Cell 37, 7–19<br />
11 Terns, M.P. and Terns, R.M. (2011) <strong>CRISPR</strong>-based adaptive <strong>immune</strong><br />
<strong>system</strong>s. Curr. Op<strong>in</strong>. Microbiol. 14, 1–7<br />
12 Garrett, R.A. et al. (2011) <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong><br />
Sulfolobales: complexity and diversity. Biochem. Soc. Trans. 39, 51–57<br />
13 Prangishvili, D. et al. (1998) Conjugation <strong>in</strong> archaea: frequent occurrence<br />
of conjugative plasmids <strong>in</strong> Sulfolobus. Plasmid 40, 190–202<br />
14 Makarova, K.S. et al. (2006) A putative RNA-<strong>in</strong>terference-based <strong>immune</strong><br />
<strong>system</strong> <strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted<br />
enzymatic mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and<br />
hypo<strong>the</strong>tical mechanisms of action. Biol. Direct 1, 7<br />
15 Makarova, K.S. et al. (2011) Evolution and classification of <strong>the</strong><br />
<strong>CRISPR</strong>-Cas <strong>system</strong>s. Nat. Rev. Microbiol. 9, 467–477<br />
16 Lillestøl, R.K. et al. (2009) <strong>CRISPR</strong> families of <strong>the</strong> crenarchaeal genus<br />
Sulfolobus: bidirectional transcription and dynamic properties. Mol.<br />
Microbiol. 72, 259–272<br />
17 Shah, S.A. and Garrett, R.A. (2011) <strong>CRISPR</strong>/Cas and Cmr modules,<br />
mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s. Res. Microbiol.<br />
162, 27–38<br />
18 Shah, S.A. et al. (2011) <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong><br />
<strong>system</strong>s of archaea. In Regulatory RNAs <strong>in</strong> Prokaryotes<br />
(Marchfelder, A. and Hess, W., eds), pp. 163–181, Spr<strong>in</strong>ger Press<br />
19 Deveau, H. et al. (2008) Phage response to <strong>CRISPR</strong>-encoded resistance<br />
<strong>in</strong> Streptococcus <strong>the</strong>rmophilus. J. Bacteriol. 190, 1390–1400<br />
20 Brouns, S.J. et al. (2008) Small <strong>CRISPR</strong> RNAs guide antiviral defense<br />
<strong>in</strong> prokaryotes. Science 321, 960–964<br />
21 Hale, C. et al. (2008) Prokaryotic silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus<br />
furiosus. RNA 14, 1–8<br />
22 Carte, J. et al. (2010) B<strong>in</strong>d<strong>in</strong>g and cleavage of <strong>CRISPR</strong> RNA by Cas6.<br />
RNA 16, 2181–2188<br />
23 Hale, C.R. et al. (2009) RNA-guided RNA cleavage by a <strong>CRISPR</strong> RNA–<br />
Cas prote<strong>in</strong> complex. Cell 139, 945–956<br />
24 Wang, R. et al. (2011) Interaction of Cas6 riboendonuclease with<br />
<strong>CRISPR</strong> RNAs: recognition and cleavage. Structure 19, 257–264<br />
25 Garneau, J.E. et al. (2010) The <strong>CRISPR</strong>/Cas bacterial <strong>immune</strong> <strong>system</strong><br />
cleaves bacteriophage and plasmid DNA. Nature 468, 67–71<br />
26 Shah,S.A.etal.(2009)Distributionsof<strong>CRISPR</strong>spacermatches<strong>in</strong>viruses<br />
and plasmids of crenarchaeal acido<strong>the</strong>rmophiles and implications for<br />
<strong>the</strong>ir <strong>in</strong>hibitory mechanism. Biochem. Soc. Trans. 37, 23–28<br />
27 Barrangou, R. et al. (2007) <strong>CRISPR</strong> provides acquired resistance<br />
aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315, 1709–1712<br />
28 Lillestøl, R.K. et al. (2006) A putative viral defence mechanism <strong>in</strong><br />
archaeal cells. <strong>Archaea</strong> 2, 59–72<br />
29 Held, N.L. et al. (2010) <strong>CRISPR</strong> associated diversity with<strong>in</strong> a<br />
population of Sulfolobus islandicus. PLoS ONE 5, e12988<br />
30 Andersson, A.F. and Banfield, J.F. (2008) Virus population dynamics<br />
and acquired resistance <strong>in</strong> natural microbial communities. Science 320,<br />
1047–1049<br />
31 Mojica, F.J. et al. (2009) Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of<br />
<strong>the</strong> prokaryotic <strong>CRISPR</strong> <strong>system</strong>. Microbiology 155, 733–740<br />
32 Tang, T-H. et al. (2002) Identification of 86 candidates for small nonmessenger<br />
RNAs from <strong>the</strong> archaeon Archaeoglobus fulgidus. Proc.<br />
Natl. Acad. Sci. U.S.A. 99, 7536–7541<br />
556<br />
33 Haurwitz, R.E. et al. (2010) Sequence and structure-specific RNA<br />
process<strong>in</strong>g by a <strong>CRISPR</strong> endonuclease. Science 10, 1355–1358<br />
34 Tang, T-H. et al. (2005) Identification of novel non-cod<strong>in</strong>g RNAs as<br />
potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus.<br />
Mol. Microbiol. 55, 469–481<br />
35 Gudbergsdottir, S. et al. (2011) Dynamic properties of <strong>the</strong> Sulfolobus<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vectorborne<br />
viral and plasmid genes and protospacers. Mol. Microbiol. 7, 35–49<br />
36 Deltcheva, E. et al. (2011) <strong>CRISPR</strong> RNA maturation by trans-encoded<br />
small RNA and host factor RNase III. Nature 471, 602–607<br />
37 Lykke-Andersen, J. et al. (1997) <strong>Archaea</strong>l <strong>in</strong>trons: splic<strong>in</strong>g,<br />
<strong>in</strong>tercellular mobility and evolution. Trends Biochem. Sci. 22, 326–331<br />
38 Pul, U. et al. (2010) Identification and characterisation of E. coli<br />
<strong>CRISPR</strong>-cas promoters and <strong>the</strong>ir silenc<strong>in</strong>g by H-NS. Mol. Microbiol.<br />
75, 1495–1512<br />
39 Agari, Y. et al. (2011) Transcription profile of Thermus <strong>the</strong>rmophilus<br />
<strong>CRISPR</strong> <strong>system</strong>s after phage <strong>in</strong>fection. J. Mol. Biol. 395, 270–281<br />
40 Manica, A. et al. (2011) In vitro activity of <strong>CRISPR</strong>-mediated virus<br />
defence <strong>in</strong> a hyper<strong>the</strong>rmophilic archaeon. Mol. Microbiol. 80, 481–491<br />
41 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2008) <strong>CRISPR</strong> <strong>in</strong>terference<br />
limits horizontal gene transfer <strong>in</strong> Staphylococci by target<strong>in</strong>g DNA.<br />
Science 322, 1843–1845<br />
42 L<strong>in</strong>dtner, N.G. et al. (2011) Structural and functional characterisation<br />
of an archaeal CASCADE complex for <strong>CRISPR</strong>-mediated viral defense.<br />
J. Biol. Chem. 85, 6287–6292<br />
43 Semenova, E. et al. (2011) Interference by clustered regularly<br />
<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat (<strong>CRISPR</strong>) RNA is governed by<br />
a seed sequence. Proc. Natl. Acad. Sci. U.S.A. 108, 10098–10103<br />
44 Guo, L. et al. (2011) Genome analyses of Icelandic stra<strong>in</strong>s of Sulfolobus<br />
islandicus, model organisms for genetic and virus–host <strong>in</strong>teraction<br />
studies. J. Bacteriol. 193, 1672–1680<br />
45 Jore, M.M. et al. (2011) Structural basis of <strong>CRISPR</strong>-guided RNA<br />
recognition. Nat. Struct. Mol. Biol. 18, 529–537<br />
46 Marraff<strong>in</strong>i, L.A. and Son<strong>the</strong>imer, E.J. (2010) Self versus non-self<br />
discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong> RNA-directed immunity. Nature 463,<br />
568–571<br />
47 Dyall-Smith, M. (2011) Dangerous weapons: a cautionary tale of<br />
<strong>CRISPR</strong> defence. Mol. Microbiol. 79, 3–6<br />
48 Stern, A. et al. (2010) Self-target<strong>in</strong>g by <strong>CRISPR</strong>: gene regulation or<br />
autioimmunity? Trends Genet. 26, 335–340<br />
49 Vestergaard, G. et al. (2008) SRV, a new rudiviral isolate from<br />
Stygiolobus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses with <strong>the</strong><br />
host viral-defence <strong>CRISPR</strong> <strong>system</strong>. J. Bacteriol. 190, 6837–6845<br />
50 Horvath et al. (2009) Comparative analysis of <strong>CRISPR</strong> loci <strong>in</strong> lactic acid<br />
bacteria genomes. Int. J. Food Microbiol. 131, 62–70<br />
51 Skennerton, C.T. et al. (2011) Phage encoded H-NS: a potential achilles<br />
heel <strong>in</strong> <strong>the</strong> bacterial defence <strong>system</strong>. PLoS ONE 6, e20095<br />
52 You, X-Y. et al. (2011) Genomic studies of Acidianus hospitalis W1 a<br />
crenarchaeal host for study<strong>in</strong>g virus and plasmid life cycles.<br />
Extremophiles 15, 487–497<br />
53 Portillo, M.C. and Gonzalez, J.M. (2009) <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong><br />
Thermococcales: evidence for associated horizontal gene transfer <strong>in</strong><br />
Pyrococcus furiosus. J. Appl. Genet. 50, 421–430<br />
54 Greve et al. (2004) Genomic comparison of archaeal conjugative<br />
plasmids from Sulfolobus. <strong>Archaea</strong> 1, 231–239<br />
55 Torar<strong>in</strong>sson, E. et al. (2005) Divergent transcriptional and<br />
translational signals <strong>in</strong> <strong>Archaea</strong>. Environ. Microbiol. 7, 47–54<br />
56 Santangelo, T.J. et al. (2009) <strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription<br />
term<strong>in</strong>ation <strong>in</strong> vivo. J. Bacteriol. 191, 7102–7108<br />
57 Haft, D.H. et al. (2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas) prote<strong>in</strong><br />
families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong> prokaryotic<br />
genomes. PLoS Comput. Biol. 1, 474–483<br />
58 Kun<strong>in</strong>, V. et al. (2007) Evolutionary conservation of sequence and<br />
secondary structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol. 8, R611–R617<br />
59 Wiedenheft, B. et al. (2009) Structural base for DNase activity of a<br />
conserved prote<strong>in</strong> implicated <strong>in</strong> <strong>CRISPR</strong>-mediated genome defense.<br />
Structure 17, 904–912<br />
60 Beloglazova, N. et al. (2008) A novel family of sequence-specific<br />
endoribonucleases associated with clustered regularly <strong>in</strong>terspaced<br />
short pal<strong>in</strong>dromic repeats. J. Biol. Chem. 283, 20361–20371<br />
61 S<strong>in</strong>kunas, T. et al. (2011) Cas3 is a s<strong>in</strong>gle-stranded DNA nuclease and<br />
ATP-dependent helicase <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong>. EMBO J.<br />
30, 1335–1342
Chapter 10<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune<br />
Systems of <strong>Archaea</strong><br />
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett*<br />
1 Introduction<br />
The <strong>CRISPR</strong>/Cas (Clustered Regularly Interspaced Short Pal<strong>in</strong>dromic Repeats/<br />
<strong>CRISPR</strong>-Associated Genes) and <strong>CRISPR</strong>/Cmr <strong>system</strong>s (Cmr: Cas module-RAMP<br />
(Repeat-Associated Mysterious Prote<strong>in</strong>s)) provide <strong>the</strong> basis for adaptive and hereditable<br />
<strong>immune</strong> responses directed aga<strong>in</strong>st <strong>the</strong> DNA and RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g<br />
elements. The former consists of <strong>CRISPR</strong> loci physically l<strong>in</strong>ked to a cassette of<br />
cas genes which toge<strong>the</strong>r appear to constitute <strong>in</strong>tegral genetic modules. cmr genes,<br />
clustered <strong>in</strong> Cmr modules, are sometimes physically l<strong>in</strong>ked to <strong>CRISPR</strong>/Cas modules.<br />
The <strong>CRISPR</strong>/Cas <strong>immune</strong> <strong>system</strong> occurs <strong>in</strong> almost all archaea and about 40 %<br />
of bacteria. Cmr modules are less common, occurr<strong>in</strong>g <strong>in</strong> only about one third of<br />
genomes carry<strong>in</strong>g <strong>CRISPR</strong>/Cas modules. An outl<strong>in</strong>e of how <strong>the</strong> <strong>CRISPR</strong>/Cas and<br />
<strong>CRISPR</strong>/Cmr <strong>system</strong>s function is <strong>in</strong>dicated <strong>in</strong> Figure 1 where <strong>the</strong> former targets<br />
DNA and <strong>the</strong> latter RNA (mRNA and/or viral RNA) of <strong>the</strong> genetic elements.<br />
<strong>Archaea</strong>l <strong>CRISPR</strong> loci consist of clusters of spacer-repeat units vary<strong>in</strong>g <strong>in</strong> size<br />
from one to more than one hundred spacer-repeat units where each unit is about<br />
60 – 90 bp with repeats and spacers of, on average, 30 bp and 40 bp, respectively<br />
(Lillestøl et al., 2006; Grissa et al., 2008). <strong>CRISPR</strong> loci are preceded by a non<br />
prote<strong>in</strong> cod<strong>in</strong>g leader region which varies <strong>in</strong> size from about 150 to 550 bp and is<br />
<strong>in</strong>variably physically l<strong>in</strong>ked to a cas gene cassette (Jansen et al., 2002; Haft et al.,<br />
2005; Makarova et al., 2006; Lillestøl et al., 2006; Lillestøl et al., 2009). Cas and<br />
Cmr prote<strong>in</strong>s, <strong>in</strong>volved <strong>in</strong> <strong>the</strong> two different target<strong>in</strong>g pathways, are functionally and<br />
phylogenetically diverse. The <strong>CRISPR</strong>/Cas <strong>system</strong> specifically targets DNA elements<br />
(Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008;; Shah et al., 2009) while <strong>the</strong> <strong>CRISPR</strong>/Cmr<br />
<strong>system</strong> targets RNA, although whe<strong>the</strong>r mRNA and/or viral RNA rema<strong>in</strong>s unclear<br />
(Hale et al., 2009). <strong>CRISPR</strong>/Cas modules have been classified <strong>in</strong>to families on <strong>the</strong><br />
basis of sequences of <strong>the</strong>ir cas genes, leaders and repeats. Although <strong>the</strong>se modules<br />
* Adress<br />
53
54<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
show a capacity for transfer between phyla of <strong>the</strong> archaeal and bacterial Doma<strong>in</strong>s,<br />
and supposedly rarely across Doma<strong>in</strong> boundaries, archaea-specific features are never<strong>the</strong>less<br />
apparent.<br />
Crucial for <strong>the</strong> function<strong>in</strong>g of <strong>the</strong> <strong>immune</strong> <strong>system</strong>s are <strong>the</strong> spacer sequences<br />
which derive from foreign <strong>in</strong>vad<strong>in</strong>g elements (Mojica et al., 2005; Pourcel et al.,<br />
2005; Bolot<strong>in</strong> et al., 2005; Lillestøl et al., 2006; Barrangou et al., 2007). The<br />
<strong>CRISPR</strong> loci generate whole transcripts which <strong>in</strong>itiate with<strong>in</strong> <strong>the</strong> leader sequence<br />
adjacent to <strong>the</strong> first repeat (Lillestøl et al., 2009). These are subsequently processed<br />
<strong>in</strong> <strong>the</strong>ir repeat regions yield<strong>in</strong>g end-products that constitute s<strong>in</strong>gle spacer-conta<strong>in</strong><strong>in</strong>g<br />
crRNAs (Tang et al., 2002; Tang et al., 2005; Lillestøl et al., 2006; Lillestøl et al.,<br />
2009). Process<strong>in</strong>g is effected by specific Cas or Cmr prote<strong>in</strong>s and, at least for <strong>the</strong><br />
leader<br />
virus<br />
new spacer<br />
viral DNA<br />
repeat<br />
viral DNA<br />
Cas-crRNA complex<br />
Cmr-crRNA complex<br />
DNA<br />
excision<br />
Cas complex<br />
cleaved viral DNA<br />
cleaved<br />
viral mRNA<br />
cleaved<br />
viral RNA<br />
Fig. 1. Diagram illustrat<strong>in</strong>g how <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s target genetic elements<br />
<strong>in</strong>vad<strong>in</strong>g a host cell. crRNAs are processed from whole transcripts of <strong>CRISPR</strong> loci.<br />
For <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong> Cas prote<strong>in</strong>s complex with <strong>the</strong> crRNA and guide it to <strong>the</strong> complementary<br />
protospacer sequence <strong>in</strong> <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA element where <strong>the</strong>y anneal prior to<br />
DNA degradation. Cmr prote<strong>in</strong>s also complex with crRNA and guide <strong>the</strong>m to ei<strong>the</strong>r mRNA<br />
or viral RNA, target<strong>in</strong>g <strong>the</strong>m for degradation
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 55<br />
latter, two discrete archaeal crRNAs are produced each carry<strong>in</strong>g 8 nt of <strong>the</strong> repeat<br />
at <strong>the</strong> 5’-end and lack<strong>in</strong>g 5 nt or 11 nt from <strong>the</strong> 3’-end of each spacer (Carte et al.,<br />
2008; Hale et al., 2009). Complexes of Cas or Cmr prote<strong>in</strong>s transport <strong>the</strong> processed<br />
crRNAs to target, and <strong>in</strong>activate, DNA or RNA, respectively, of <strong>in</strong>vad<strong>in</strong>g genetic<br />
elements (Brouns et al., 2008; Hale et al., 2008; Carte et al., 2008; Hale et al.,<br />
2009). Base pair<strong>in</strong>g mismatches occurr<strong>in</strong>g between <strong>the</strong> 5’ 8 nt repeat sequence of<br />
<strong>the</strong> crRNA and <strong>the</strong> Protospacer-Associated Motif (PAM) sequence adjacent to <strong>the</strong><br />
targeted protospacer of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g DNA are essential for subsequent degradation<br />
of <strong>the</strong> latter and for ensur<strong>in</strong>g that <strong>the</strong> chromosomal <strong>CRISPR</strong> locus, itself, is not<br />
targeted (Horvath and Barrangou, 2010;; Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2010;; Lillestøl<br />
et al., 2009; Gudbergsdottir et al., 2010).<br />
2 <strong>Archaea</strong>l Viruses and Plasmids<br />
and Chromosomal Evolution<br />
Although few comprehensive studies have been performed on <strong>the</strong> relative abundance<br />
of different virus-like particle (VLP) morphotypes <strong>in</strong> archaea-rich environments,<br />
available results <strong>in</strong>dicate that sp<strong>in</strong>dles, filaments, rods and spheres predom<strong>in</strong>ate<br />
<strong>in</strong> terrestial hot spr<strong>in</strong>gs and hydro<strong>the</strong>rmal vents, while sp<strong>in</strong>dle-shaped and<br />
spherical virus-like particles (VLPs) prevail <strong>in</strong> hypersal<strong>in</strong>e environments (Rachel<br />
et al., 2002; Porter et al., 2007; Bize et al., 2008). Bacteriophage-like head-tail<br />
VLPs are found <strong>in</strong>frequently, although <strong>the</strong>ir proviruses have been detected <strong>in</strong> a few<br />
halo- and methanoarchaeal genomes (Porter et al., 2007; Krupovic et al., 2010).<br />
Several viruses, ma<strong>in</strong>ly from terrestial hot spr<strong>in</strong>gs have been classified <strong>in</strong>to eight<br />
new viral archaeal families and examples of <strong>the</strong>ir diverse morphotypes are illustrated<br />
<strong>in</strong> Figure 2. O<strong>the</strong>r viruses <strong>in</strong>clud<strong>in</strong>g several haloarchaeal viruses rema<strong>in</strong> to be<br />
classified (Porter et al., 2007). The latter process is complicated by <strong>the</strong> absence of<br />
a consistent relationship between morphology and genomic properties for euryarchaeal<br />
and crenarchaeal viruses. Overall <strong>the</strong>se discoveries underl<strong>in</strong>e <strong>the</strong> major differences<br />
between <strong>the</strong> archaeal and bacterial virospheres (Prangishvili et al., 2006a;<br />
Lawrence et al., 2009).<br />
<strong>Archaea</strong>l viral genomes fall <strong>in</strong> <strong>the</strong> size range 15 to 75 kb dsDNA and are circular<br />
or l<strong>in</strong>ear. Some l<strong>in</strong>ear genomes have free ends whereas o<strong>the</strong>rs, <strong>in</strong>clud<strong>in</strong>g those of<br />
rudiviruses and some lipothrixviruses have modified ends or are covalently closed<br />
and some genomes carry base-specific modifications (Zillig et al., 1998;; Peng et al.,<br />
2001). Consistent with <strong>the</strong> unusual and sometimes unique viral morphologies (Figure<br />
2), <strong>the</strong> viral genomes yielded very few significant sequence matches with genes<br />
<strong>in</strong> public sequence databases (Prangishvili et al., 2006b). These results are summarised<br />
<strong>in</strong> histograms of <strong>the</strong> major hyper<strong>the</strong>rmophilic crenarchaeal viruses <strong>in</strong> Figure<br />
3 where a large percentage of <strong>the</strong> genes are classified as unique for each virus.<br />
The most extreme case was for genes of <strong>the</strong> <strong>the</strong>rmoneutrophilic virus PSV which<br />
yielded almost no significant sequence matches <strong>in</strong> <strong>the</strong> orig<strong>in</strong>al study (Bettstetter<br />
et al., 2003).
56<br />
a b c<br />
e<br />
d<br />
f<br />
h<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
Fig. 2. Typical morphologies of representatives of different families of archaeal viral families.<br />
a, SNDV; b, STSV1; c, ATV; d, SIFV; e, AFV1; f, PSV; g, SSV4; h, ARV1. Bars are<br />
100 nm<br />
With <strong>the</strong> availability of an <strong>in</strong>creas<strong>in</strong>g number of archaeal genome sequences,<br />
it has become clear that archaeal viruses and plasmids have played a major role<br />
<strong>in</strong> <strong>the</strong> evolution of host genomes. This process has apparently been fuelled by <strong>the</strong><br />
entrapment of foreign DNA elements <strong>in</strong> host chromosomes via an archaea-specific<br />
<strong>in</strong>tegrative process. Many archaeal <strong>in</strong>tegrase genes partition on <strong>in</strong>tegration such<br />
that, if <strong>the</strong> free form of <strong>the</strong> element is lost, <strong>the</strong> <strong>in</strong>tegrase will not be expressed<br />
and cannot effect excision of <strong>the</strong> genetic element from <strong>the</strong> chromosome (She et al.,<br />
2001). Many of <strong>the</strong> encaptured elements are recognisable as <strong>in</strong>tact or degenerate<br />
genetic entities and Markov-model analyses of whole archaeal genomes suggest<br />
that such genes of viral or plasmid orig<strong>in</strong> contribute disproportionately to <strong>the</strong> genes<br />
of unknown function <strong>in</strong> archaeal chromosomes (Cortez et al., 2009).<br />
<strong>Archaea</strong>l viruses and plasmids have also evolved complex relationships as dependents<br />
or antagonists. Thus, <strong>in</strong> <strong>the</strong> presence of a fusellovirus, pRN family plasmids<br />
g
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 57<br />
Fig. 3. Histogram show<strong>in</strong>g a summary of archaeal viral gene homologies to o<strong>the</strong>r viruses<br />
(virus only genes) and cellular chromosomes (cellular); unique <strong>in</strong>dicates no detectable homologs.<br />
Homologs <strong>in</strong> closely related viruses, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> rudiviruses ARV1 and SIRV1 and<br />
<strong>the</strong> spherical viruses PSV and TTSV1 are not <strong>in</strong>cluded (Prangishvili et al., 2006b)<br />
pSSVx and pSSVi are packaged <strong>in</strong>to fusellovirus-like particles and spread through<br />
Sulfolobus host cultures as satellite viruses (Arnold et al., 1999; Wang et al., 2007).<br />
In contrast, when a stra<strong>in</strong> of Acidianus hospitalis carry<strong>in</strong>g <strong>the</strong> conjugative plasmid<br />
pAH1 was <strong>in</strong>fected with <strong>the</strong> lipothrixvirus AFV1, plasmid replication appeared to<br />
be <strong>in</strong>hibited (Basta et al., 2009). Moreover, as mentioned below, Sulfolobus conjugative<br />
plasmids pNOB8 and pKEF9 carry <strong>CRISPR</strong> loci which may directly target<br />
and <strong>in</strong>activate archaeal viruses (She et al., 1998; Greve et al., 2004).<br />
3 Diversity of <strong>Archaea</strong>l <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr<br />
Immune Systems<br />
Bio<strong>in</strong>formatic analyses have demonstrated that homologs of a few core Cas prote<strong>in</strong>s<br />
occur widely throughout <strong>the</strong> archaeal and bacterial doma<strong>in</strong>s while o<strong>the</strong>rs occur less<br />
commonly and some are predom<strong>in</strong>antly archaeal or bacterial <strong>in</strong> character. Core<br />
gene sets typify <strong>the</strong> cas and cmr gene cassettes (Figure 4). For <strong>the</strong> former, <strong>the</strong> cas<br />
genes fall <strong>in</strong>to groups 1 and 2. This division is based on different factors <strong>in</strong>clud<strong>in</strong>g<br />
co-occurrence, co-regulation and synteny of <strong>the</strong> genes and, possibly, functional differences<br />
for <strong>the</strong> groups of prote<strong>in</strong>s (see below). The cas6 gene can occur <strong>in</strong> ei<strong>the</strong>r<br />
group and is likely to be cofunctional with both <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr
58<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
<strong>system</strong>s (Hale et al., 2009). For <strong>the</strong> cmr cassette, <strong>the</strong> two most conserved genes<br />
cmr2 and cmr5 are <strong>in</strong>terspersed with diverse RAMP-motif conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s<br />
(Figure 4B).<br />
It has also been shown that <strong>the</strong>re is a consistent phylogenetic l<strong>in</strong>kage between<br />
sequences of selected Cas prote<strong>in</strong>s and <strong>CRISPR</strong> locus repeats for archaea and<br />
bacteria (Haft et al., 2005; Makarova et al., 2006; Kun<strong>in</strong> et al., 2007; Shah et al.,<br />
2009). Fur<strong>the</strong>rmore for <strong>the</strong> Sulfolobales, a broader analysis of sequences of repeats,<br />
leader regions, and of Cas1 prote<strong>in</strong>s, demonstrated that <strong>the</strong> <strong>CRISPR</strong>/Cas modules<br />
could be classified <strong>in</strong>to dist<strong>in</strong>ct <strong>CRISPR</strong>/Cas families I to IV (Lillestøl et al., 2009;;<br />
Shah and Garrett, 2010) which are components of an earlier more broadly def<strong>in</strong>ed<br />
group of families CASS1 + 5 + 6 + 7 from archaea and bacteria (Haft et al., 2005;<br />
Makarova et al., 2006). Spatial distributions of all <strong>the</strong> archaeal and bacterial families<br />
are illustrated <strong>in</strong> Figure 5A us<strong>in</strong>g a Markov cluster<strong>in</strong>g approach based on Cas1<br />
prote<strong>in</strong> sequences. Whereas <strong>the</strong> crenarchaeal families I, II and III tend to cluster<br />
separately, <strong>the</strong> archaeal family IV sequences, which derive ma<strong>in</strong>ly from mesophilic<br />
euryarchaea, fall toge<strong>the</strong>r with a family of bacterial sequences (<strong>in</strong> green). A closely<br />
similar spatial distribution is also observed when crenarchaeal families I to IV are<br />
Fig. 4. Gene maps of A. cas cassettes and B. a Cmr module show<strong>in</strong>g only conserved core<br />
genes. Many o<strong>the</strong>r genes that occur less frequently are not <strong>in</strong>cluded. The cas genes are divided<br />
<strong>in</strong>to two groups 1 and 2 (see text). The Cmr module conta<strong>in</strong>s <strong>the</strong> highly conserved<br />
cmr2 and cmr5 genes and genes a to e, shaded grey, which correspond to different genes<br />
encod<strong>in</strong>g RAMP motif-conta<strong>in</strong><strong>in</strong>g prote<strong>in</strong>s which are present <strong>in</strong> 3 to 5 copies <strong>in</strong> <strong>the</strong> different<br />
Cmr module families (Garrett et al., 2010b)<br />
► Fig. 5. <strong>CRISPR</strong>/Cas modules can be divided <strong>in</strong>to families based on <strong>the</strong>ir unique characteristics,<br />
<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> Cas1 prote<strong>in</strong> sequence and nucleotide sequences of <strong>the</strong> repeat and<br />
leader regions.<br />
a) Spheres represent Cas1 prote<strong>in</strong> sequences from different organisms. Small distances between<br />
spheres reflects higher sequence similarity between <strong>the</strong>m. All Cas1 sequences that are<br />
currently publicly available are represented. Markov cluster<strong>in</strong>g reveals that all <strong>the</strong> sequences<br />
fall with<strong>in</strong> about 20 families (each coloured differently), 5 of which are very large. Strongly<br />
coloured spheres represent archaeal Cas1 sequences while bacterial sequences are shown <strong>in</strong><br />
faded colours. It is evident that some families are specific to bacteria, whereas o<strong>the</strong>rs are<br />
archaea-specific. A few <strong>CRISPR</strong>/Cas families are shared between both archaea and bacteria.<br />
Sulfolobales families I – IV are marked (Lillestøl et al. 2009) and o<strong>the</strong>rs rema<strong>in</strong> to be formally<br />
classified. An earlier broader classification, CASS1 to 7, is also <strong>in</strong>cluded (Haft et al.<br />
2005; Makarova et al. 2006).<br />
b) Leaders from <strong>the</strong> Sulfolobales are clustered based on <strong>the</strong>ir sequence similarities and <strong>the</strong>y<br />
fall <strong>in</strong>to <strong>the</strong> same group of families (I–IV) as those found for <strong>the</strong> Cas1 prote<strong>in</strong>s, and a similar<br />
result is obta<strong>in</strong>ed when repeat sequences are clustered (Lillestøl et al. 2009)
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 59<br />
clustered on <strong>the</strong> basis of <strong>the</strong>ir leader sequences (Figure 5B). Clearly, <strong>the</strong>re are o<strong>the</strong>r<br />
archaea-specific families (strongly coloured <strong>in</strong> Figure 5A) which rema<strong>in</strong> to be analysed<br />
and classified.<br />
Family I <strong>CRISPR</strong>/Cas modules are <strong>the</strong> most common amongst <strong>the</strong> Sulfolobales<br />
and o<strong>the</strong>r crenarchaea, and <strong>the</strong> most conserved <strong>in</strong> structural organisation. The two<br />
conserved groups of cas genes are located between <strong>the</strong> leaders and externally at<br />
one end of <strong>the</strong> module. The separation may be functionally significant with <strong>the</strong> former<br />
<strong>in</strong>volved <strong>in</strong> process<strong>in</strong>g and <strong>in</strong>sertion of DNA spacer-repeat units and <strong>the</strong> latter<br />
encod<strong>in</strong>g RNA process<strong>in</strong>g and effector prote<strong>in</strong>s (Shah and Garrett, 2010).<br />
a<br />
b
60<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
The methanoarchaea and haloarchaea, which carry <strong>the</strong> majority of <strong>the</strong> family<br />
IV <strong>CRISPR</strong>/Cas modules show <strong>the</strong> least conservation <strong>in</strong> <strong>the</strong>ir cas gene contents. In<br />
particular, <strong>the</strong>ir group 2 cas genes range from those typical of crenarchaea to those<br />
common amongst bacteria. Putative genetic exchange between archaea and bacteria<br />
has generally been attributed to <strong>the</strong> methanoarchaea and haloarchaea thriv<strong>in</strong>g <strong>in</strong><br />
environments rich <strong>in</strong> bacteria.<br />
Cmr modules <strong>in</strong>variably coexist with, and are sometimes physically l<strong>in</strong>ked to,<br />
<strong>CRISPR</strong>/Cas modules but <strong>the</strong>y occur less widely than <strong>the</strong> latter. For archaea <strong>the</strong>y<br />
are found <strong>in</strong> about 70 % of genomes carry<strong>in</strong>g <strong>CRISPR</strong>/Cas modules, more prevalent<br />
than for <strong>CRISPR</strong>/Cas carry<strong>in</strong>g bacterial genomes (about 30 %). Both <strong>CRISPR</strong>/Cas<br />
modules and Cmr modules frequently occur <strong>in</strong> multiple copies <strong>in</strong> a given archaeal<br />
genome. cmr genes are ma<strong>in</strong>ly co-transcribed and <strong>the</strong>ir prote<strong>in</strong> products have been<br />
implicated <strong>in</strong> process<strong>in</strong>g of crRNAs and <strong>in</strong> <strong>the</strong> guid<strong>in</strong>g of crRNAs to target RNA of<br />
<strong>in</strong>vad<strong>in</strong>g genetic elements, whe<strong>the</strong>r viral RNA, transcripts, or both, rema<strong>in</strong>s unclear<br />
(Hale et al., 2009).<br />
Comparison of phylogenetic trees for <strong>the</strong> <strong>CRISPR</strong>/Cas and Cmr modules, based<br />
on archaeal and bacterial sequences of Cas1 and <strong>the</strong> Cmr2 prote<strong>in</strong>, and its homologs<br />
Csm1 and Csx11, revealed five major families of Cmr modules, named A to E,<br />
show<strong>in</strong>g dist<strong>in</strong>ctive gene syntenies (Garrett et al., 2010b).<br />
Given that Cmr and <strong>CRISPR</strong>/Cas modules are sometimes physically l<strong>in</strong>ked and<br />
can potentially be mobilised as a unit, and that <strong>the</strong>y have to recognise <strong>CRISPR</strong><br />
repeat sequences of similar sequence, it is likely that some degree of coevolution<br />
has occurred. In support of this idea, <strong>the</strong>re are many examples of family II <strong>CRISPR</strong>/<br />
Cas modules coexist<strong>in</strong>g with family D Cmr modules amongst <strong>the</strong> Sulfolobales and<br />
this relationship extends to o<strong>the</strong>r archaea <strong>in</strong>clud<strong>in</strong>g for example, <strong>the</strong> euryarchaeon<br />
Methanospirillum hungatei.<br />
Sizes of <strong>CRISPR</strong> loci vary from a s<strong>in</strong>gle spacer bordered by repeats to more<br />
than 100 spacer-repeat units (Lillestøl et al., 2006; Grissa et al., 2008). New spacerrepeat<br />
units are added at <strong>the</strong> leader-repeat junction and <strong>the</strong> <strong>CRISPR</strong> loci also<br />
undergo deletions of spacer-repeat units, probably via recomb<strong>in</strong>ation at <strong>the</strong> direct<br />
repeats, without impair<strong>in</strong>g <strong>the</strong> overall <strong>CRISPR</strong>/Cas functionality, and <strong>the</strong> deletions<br />
can range from one to several spacer-repeat units. Moreover, <strong>the</strong>re are also putative<br />
examples of duplications of spacer-repeat units, or small groups <strong>the</strong>reof, occurr<strong>in</strong>g<br />
and exchange between <strong>CRISPR</strong> loci with<strong>in</strong> a genome (Lillestøl et al., 2006; Lillestøl<br />
et al., 2009; Shah and Garrett, 2010; Gudbergsdottir et al., 2010).<br />
4 Development and Stability of <strong>CRISPR</strong> Loci<br />
<strong>CRISPR</strong> loci generally appear to be quite stable, gradually add<strong>in</strong>g spacer-repeat<br />
units at <strong>the</strong> junction with <strong>the</strong> leader, albeit at different rates for different loci with<strong>in</strong><br />
an organism. There is also a compensatory mechanism for gradual loss of <strong>in</strong>ternal<br />
spacers which probably <strong>in</strong>volves recomb<strong>in</strong>ation between <strong>the</strong> identical direct<br />
repeats of a given locus, and occasionally between loci carry<strong>in</strong>g identical repeats<br />
(Lillestøl et al., 2009;; Shah and Garrett, 2010). A specific example of such changes
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 61<br />
is illustrated <strong>in</strong> Figure 6, show<strong>in</strong>g <strong>the</strong> pairwise alignments of <strong>CRISPR</strong> locus A of<br />
Sulfolobus solfataricus stra<strong>in</strong>s P1, P2 and 98/2 where shared spacers are shaded, as<br />
well as spacers added adjacent to <strong>the</strong> leader region after <strong>the</strong>se stra<strong>in</strong>s diverged. The<br />
pattern of shared spacers for each pair of organisms demonstrate that stra<strong>in</strong> 98/2<br />
separated prior to <strong>the</strong> divergence of stra<strong>in</strong>s P1 and P2 which carry more common<br />
spacers. Those spacers which show significant matches to known genetic elements<br />
are also colour-coded (Figure 6A,B) <strong>in</strong>dicat<strong>in</strong>g a wide variety of matches especially<br />
to rudiviruses, bicaudaviruses and conjugative plasmids (Lillestøl et al., 2009).<br />
Earlier evidence suggested that <strong>CRISPR</strong> loci were strongly resistant to <strong>in</strong>tegrative<br />
events (Lillestøl et al., 2006). For example, three stra<strong>in</strong>s of S. solfataricus P1,<br />
P2 and 98, which carry multiple large <strong>CRISPR</strong> loci, <strong>in</strong> addition to locus A <strong>in</strong> Figure<br />
6. They are also extremely rich <strong>in</strong> active transposable elements (about 350 <strong>in</strong> stra<strong>in</strong><br />
P2) which have contributed to extensive genome shuffl<strong>in</strong>g (Brügger et al., 2004) but<br />
no IS <strong>in</strong>sertions were detected <strong>in</strong> <strong>the</strong> extensive <strong>CRISPR</strong> loci (Lillestøl et al., 2009;<br />
Shah and Garrett, 2010). Thus, although <strong>the</strong>y do occasionally occur <strong>in</strong>tergenically<br />
<strong>in</strong> <strong>the</strong> cas and cmr gene clusters, <strong>the</strong>re appears to be a strong selective pressure to<br />
ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> <strong>in</strong>tegrity of <strong>CRISPR</strong> loci which are essential for <strong>the</strong> function of both<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s. Whe<strong>the</strong>r this is a general rule for archaea<br />
or is dependent on environmental conditions, <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> levels of viruses and<br />
plasmids present, is unclear. A different picture has emerged from bacterial studies.<br />
For example, <strong>in</strong> a biofilm carry<strong>in</strong>g acidophilic Leptospirillum group II bacteria,<br />
about 20 % of <strong>the</strong> partially sequenced <strong>CRISPR</strong> loci conta<strong>in</strong>ed IS elements (Tyson<br />
and Banfield, 2008).<br />
Many archaeal and bacterial chromosomes, with or without <strong>CRISPR</strong>/Cas modules,<br />
carry short <strong>CRISPR</strong>-like clusters lack<strong>in</strong>g associated leader regions and cas<br />
genes (Grissa et al., 2008). Although <strong>the</strong>ir orig<strong>in</strong>(s) rema<strong>in</strong> unknown, <strong>the</strong>y may<br />
have separated from <strong>in</strong>tact <strong>CRISPR</strong>/Cas modules, possibly via transposable elements.<br />
If preceded by promoters <strong>the</strong>ir transcripts can, <strong>in</strong> pr<strong>in</strong>ciple, be processed<br />
and activated. Such <strong>CRISPR</strong> loci are present <strong>in</strong> Sulfolobus conjugative plasmids<br />
pNOB8 and pKEF9 (She et al., 1998; Greve et al., 2004) and at least for <strong>the</strong> latter,<br />
Fig. 6. Pairwise comparison of <strong>the</strong> spacer-repeat units of <strong>CRISPR</strong> A locus of three closely<br />
related stra<strong>in</strong>s of S. solfataricus P1, P2 and 98/2. Shaded regions <strong>in</strong>dicate identical spacerrepeat<br />
units shared by two <strong>CRISPR</strong> loci. Colour-coded spacer-repeat units <strong>in</strong>dicate that spacers<br />
have significant sequence matches to <strong>the</strong> viruses or plasmid families <strong>in</strong>dicated on <strong>the</strong><br />
Figure
62<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
<strong>the</strong> spacer-repeat cluster is transcribed and RNA processed <strong>in</strong> a S. solfataricus host,<br />
suggest<strong>in</strong>g that at least some of <strong>the</strong>se small clusters can be activated and functional<br />
if complementary Cas or Cmr prote<strong>in</strong>s are present (Lillestøl et al., 2009; Shah and<br />
Garrett, 2010).<br />
5 Mobility of <strong>CRISPR</strong>/Cas and Cmr Modules<br />
Genomic analyses of closely related Sulfolobus species have provided strong evidence<br />
for <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g mobilised given that <strong>the</strong>y occur at different<br />
genomic positions even when <strong>the</strong>re is high level of gene synteny present and <strong>the</strong>y<br />
are generally conf<strong>in</strong>ed to <strong>the</strong> variable genetic regions (Shah and Garrett, 2010).<br />
Their ability to transfer between organisms is also supported by <strong>the</strong> different comb<strong>in</strong>ations<br />
of <strong>CRISPR</strong>/Cas families found <strong>in</strong> closely related organisms (Lillestøl<br />
et al., 2009; Shah and Garrett, 2010). For example, <strong>in</strong> S. islandicus stra<strong>in</strong>s HVE10/4<br />
and REY15A, <strong>the</strong> former carries family I and III <strong>CRISPR</strong>/Cas modules and one<br />
Cmr module, while <strong>the</strong> latter exhibits a family I <strong>CRISPR</strong>/Cas module and two family<br />
B Cmr modules (Shah and Garrett, 2010; Guo et al., 2010). Fur<strong>the</strong>r support for<br />
such transfer was provided by analysis of <strong>the</strong> Pyrococcus furiosus genome where<br />
a 155 kb fragment bordered by a <strong>CRISPR</strong> locus and a repeat show<strong>in</strong>g significantly<br />
different properties of G+C content, third codon position and codon usage from <strong>the</strong><br />
rest of <strong>the</strong> genome (Portillo and Gonzalez, 2009).<br />
Evidence for gene exchange with<strong>in</strong> <strong>the</strong> <strong>CRISPR</strong>/Cas modules derived from<br />
exam<strong>in</strong>ation of <strong>the</strong> structural <strong>in</strong>tegrities of <strong>the</strong> paired family I <strong>CRISPR</strong>/Cas modules<br />
of several closely related Sulfolobus stra<strong>in</strong>s. The results <strong>in</strong>dicated that <strong>the</strong><br />
<strong>in</strong>ternal group 1 cas genes, which are functionally implicated <strong>in</strong> spacer addition at<br />
<strong>the</strong> leader-repeat junction (Figure 4) seem to coevolve, and be mobilised, with <strong>the</strong><br />
<strong>CRISPR</strong> locus whereas <strong>the</strong> group 2 cas genes, putatively <strong>in</strong>volved <strong>in</strong> RNA process<strong>in</strong>g<br />
and crRNA mobility (Figure 4), were reta<strong>in</strong>ed with<strong>in</strong> <strong>the</strong> stra<strong>in</strong>s, suggest<strong>in</strong>g that<br />
some exchange with<strong>in</strong> cas gene cassettes can occur (Shah and Garrett, 2010).<br />
The mechanism(s) of transfer of <strong>CRISPR</strong>/Cas modules, vary<strong>in</strong>g <strong>in</strong> size from<br />
about 7 kb to 25 kb, rema<strong>in</strong>s unclear. The larger <strong>CRISPR</strong>/Cas modules, at least,<br />
may be too large to be borne on <strong>the</strong> plasmids as has been proposed for bacteria<br />
(Godde and Bickerton, 2006). At least for <strong>the</strong> crenarchaea, genetic elements are<br />
relatively small and, although small <strong>CRISPR</strong> loci have been detected <strong>in</strong> crenarchaeal<br />
conjugative plasmids, transfer is more likely to result from chromosomal<br />
conjugation which may well be facilitated by <strong>in</strong>tegrated conjugative plasmids (Lillestøl<br />
et al., 2009).<br />
6 Targets of <strong>the</strong> <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Systems<br />
Bio<strong>in</strong>formatic evidence <strong>in</strong>dicated that <strong>the</strong> spacer crRNAs carry<strong>in</strong>g significant<br />
sequence matches to <strong>the</strong> protospacer sequence were complementary to ei<strong>the</strong>r strand
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 63<br />
of genes imply<strong>in</strong>g that <strong>the</strong>y were not exclusively target<strong>in</strong>g mRNAs (Lillestøl et al.,<br />
2006). Moreover, extensive analyses of significant matches to <strong>the</strong> many known<br />
viruses and plasmids of <strong>the</strong> Sulfolobales revealed several matches to protospacers<br />
ly<strong>in</strong>g between genes. They demonstrated, fur<strong>the</strong>r, that <strong>the</strong> locations of <strong>the</strong> protospacers<br />
were randomly distributed along, and on ei<strong>the</strong>r strand of, <strong>the</strong> genetic elements.<br />
This is illustrated <strong>in</strong> Figure 7 for five crenarchaeal viruses and two plasmids,<br />
where <strong>the</strong> positions of <strong>the</strong> significant matches are shown <strong>in</strong> relation to <strong>the</strong> annotated<br />
gene locations. A similar conclusion that DNA, and not mRNA, was targeted by <strong>the</strong><br />
Fig. 7. Significant <strong>CRISPR</strong> spacer matches to protospacer sequences are superimposed on<br />
genomes of <strong>the</strong> follow<strong>in</strong>g representative viruses and plasmids: SIRV1 – rudiviruses, AFV9 –<br />
betalipothrixviruses, SSV2 – fuselloviruses, STIV turreted icosahedral viruses, ATV – bicaudavirus,<br />
pNOB8 – conjugative plasmids. and pHEN7 – cryptic plasmids where circular<br />
genomes (SSV2, STIV, ATV, pNOB8 and pHEN7) are presented <strong>in</strong> a l<strong>in</strong>ear form. Prote<strong>in</strong><br />
cod<strong>in</strong>g regions are boxed and shaded, as <strong>in</strong>dicated on <strong>the</strong> Figure, accord<strong>in</strong>g <strong>the</strong>ir levels of<br />
conservation for those genomes. No comparative genomic data were used for ATV. Spacer<br />
sequence matches are <strong>in</strong>dicated by l<strong>in</strong>es above and below <strong>the</strong> genomes for <strong>the</strong> two DNA<br />
strands and <strong>the</strong>y are colour-coded accord<strong>in</strong>g to whe<strong>the</strong>r <strong>the</strong>y occur exclusively at a nucleotide<br />
level (red) or additionally at an am<strong>in</strong>o acid level (green). Significant spacer matches<br />
were found by sett<strong>in</strong>g an e-value cut off correspond<strong>in</strong>g to a 10 % false positive ratio, which<br />
was estimated by us<strong>in</strong>g <strong>the</strong> genome of S. acidocaldarius as a negative control (Chen et al.,<br />
2005). These data are updated from an earlier study (Shah et al., 2009)
64<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
<strong>CRISPR</strong>/Cas <strong>system</strong> of <strong>the</strong> bacterium Staphylococcus epidermidis was achieved<br />
experimentally (Marraff<strong>in</strong>i and Son<strong>the</strong>imer, 2008).<br />
However, more recently it was demonstrated that crRNAs complexed with Cmr<br />
prote<strong>in</strong>s target RNA carry<strong>in</strong>g match<strong>in</strong>g protospacers (Hale et al., 2009) but it is still<br />
unclear whe<strong>the</strong>r this <strong>in</strong>cludes both mRNAs and viral RNAs. For archaea, this will<br />
only be resolved when <strong>the</strong> first archaeal RNA viruses have been characterised.<br />
A few sequence matches have been detected between archaeal <strong>CRISPR</strong> spacers<br />
and IS elements suggest<strong>in</strong>g that <strong>CRISPR</strong>/Cas <strong>system</strong> can target transposable elements<br />
(Lillestøl et al., 2006; Held and Whitaker, 2009; Mojica et al., 2009; Shah<br />
et al., 2009). However, most of those reported can be attributed to transposase<br />
genes carried on viral genomes or plasmids, <strong>in</strong>clud<strong>in</strong>g, for example, spacer matches<br />
to each of <strong>the</strong> four transposase genes of <strong>the</strong> bicaudavirus ATV (Figure 7) but <strong>the</strong>se<br />
transposase genes/IS elements are presumably <strong>in</strong>dist<strong>in</strong>guishable from any o<strong>the</strong>r<br />
viral/plasmid genomic target if <strong>the</strong>y carry appropriate PAM motifs adjacent to protospacer<br />
sites.<br />
7 Formation of crRNAs and Target<strong>in</strong>g of Foreign Elements<br />
The few archaeal <strong>CRISPR</strong> loci that have been tested experimentally for transcription,<br />
<strong>in</strong>clud<strong>in</strong>g some lack<strong>in</strong>g <strong>in</strong>tact leader regions, produced processed transcripts<br />
(Tang et al., 2002; Tang et al., 2005; Lillestøl et al., 2006; Carte et al., 2008; Lillestøl<br />
et al., 2009). Sulfolobus acidocaldarius carries five <strong>CRISPR</strong> loci with sizes of<br />
133, 78, 11, 5 and 2 spacer-repeat units. For <strong>the</strong> four smaller clusters, whole length<br />
transcripts were detected experimentally and for locus-78, <strong>the</strong> maximum transcript<br />
size of about 5000 nt, exceeded <strong>the</strong> size of <strong>the</strong> 4930 bp <strong>CRISPR</strong> locus, consistent<br />
with <strong>the</strong> whole transcript extend<strong>in</strong>g from with<strong>in</strong> <strong>the</strong> leader region and term<strong>in</strong>at<strong>in</strong>g<br />
downstream from <strong>the</strong> locus (Lillestøl et al., 2006; Lillestøl et al., 2009). However,<br />
a large fraction of <strong>the</strong> transcripts also fell <strong>in</strong> <strong>the</strong> size range 3000–3500 nt suggest<strong>in</strong>g<br />
that endogenous degradation, premature term<strong>in</strong>ation or process<strong>in</strong>g had occurred<br />
towards <strong>the</strong> 3’-end of <strong>the</strong> transcript. Given that promoter and term<strong>in</strong>ator motifs will<br />
be randomly taken up <strong>in</strong> spacers of <strong>CRISPR</strong> loci (Shah et al., 2009), <strong>the</strong>re must be<br />
some form of transcriptional regulation to ensure <strong>the</strong> formation of whole <strong>CRISPR</strong><br />
transcript from <strong>the</strong> foreign genetic elements, possibly <strong>in</strong>volv<strong>in</strong>g <strong>the</strong> Sulfolobus<br />
<strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> (Peng et al., 2003).<br />
In <strong>the</strong> euryarchaeon P. furiosus and <strong>in</strong> Escherichia coli RNA transcripts are processed<br />
with<strong>in</strong> repeats, 8 nt from <strong>the</strong> spacer start by <strong>the</strong> Cas6-type endonuclease.<br />
The process<strong>in</strong>g of <strong>the</strong> 3’-end is less clear but for P. furiosus it occurs at two sites<br />
with<strong>in</strong> <strong>the</strong> spacer, at 5 nt and 11 nt from <strong>the</strong> 3’-end of <strong>the</strong> spacer sequence. Complexes<br />
of Cas or Cmr prote<strong>in</strong>s guide <strong>the</strong> mature crRNAs to <strong>the</strong>ir targets (Brouns<br />
et al., 2008; Hale et al., 2009). Anneal<strong>in</strong>g of <strong>the</strong> spacer sequence of <strong>the</strong> crRNA to<br />
<strong>the</strong> protospacer of <strong>the</strong> <strong>in</strong>vad<strong>in</strong>g element is crucial for <strong>the</strong> recognition and <strong>in</strong>activation<br />
of <strong>the</strong> target. For <strong>the</strong> bacterium Streptococcus <strong>the</strong>rmophilus it was claimed that<br />
100 % sequence match<strong>in</strong>g between <strong>the</strong> crRNAs and protospacer RNAs was essential<br />
for target <strong>in</strong>activation (Barrangou et al., 2007; Horvath and Barrangou, 2010).
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 65<br />
However, for S. solfataricus and Sulfolobus islandicus <strong>the</strong> requirements appear to<br />
be much less str<strong>in</strong>gent because even with 3 mismatches between crRNA and protospacer<br />
target<strong>in</strong>g was still effective (Gudbergsdottir et al., 2010).<br />
There may also be differences between some archaea and bacteria <strong>in</strong> <strong>the</strong> role<br />
of <strong>the</strong> family specific Protospacer-Associated Motif (PAM) complementary to part<br />
of <strong>the</strong> 5’-repeat sequence of <strong>the</strong> crRNA which, <strong>in</strong> Sulfolobus species constitutes a<br />
conserved d<strong>in</strong>ucleotide (Lillestøl et al., 2009). For S. islandicus it was shown that<br />
alter<strong>in</strong>g <strong>the</strong> PAM motif <strong>in</strong>hibited protospacer target<strong>in</strong>g (Gudbergsdottir et al., 2010)<br />
whereas for <strong>the</strong> bacterium Staphylococcus epidermidis it was concluded that any<br />
sequence mismatch with <strong>the</strong> 5’-end of <strong>the</strong> crRNA ensured protospacer target<strong>in</strong>g and<br />
that sequence complementarity to <strong>the</strong> PAM motif was not essential (Marraff<strong>in</strong>i and<br />
Son<strong>the</strong>imer, 2010).<br />
The <strong>CRISPR</strong>-like locus of pKEF9 lacks an associated cas cassette and leader<br />
region but when transformed <strong>in</strong>to S. solfataricus P2 it produced transcripts cover<strong>in</strong>g<br />
<strong>the</strong> whole <strong>CRISPR</strong> locus <strong>in</strong>itiat<strong>in</strong>g 32 bp upstream from <strong>the</strong> first repeat and<br />
<strong>the</strong>se were found to be processed. Process<strong>in</strong>g sites were detected with<strong>in</strong> each repeat<br />
spacer unit but some of <strong>the</strong> sites occurred with<strong>in</strong> <strong>the</strong> spacer. At <strong>the</strong> time it was<br />
presumed that some <strong>in</strong>accurate process<strong>in</strong>g had occurred, possibly reflect<strong>in</strong>g mismatches<br />
occurr<strong>in</strong>g between <strong>the</strong> plasmid repeat sequence and <strong>the</strong> host Cas prote<strong>in</strong>s<br />
(Lillestøl et al., 2009), but it was not known <strong>the</strong>n that Cmr prote<strong>in</strong>s process with<strong>in</strong><br />
<strong>the</strong> 3’-ends of spacers (Hale et al., 2009).<br />
In contrast to reports on <strong>the</strong> euryarchaeal <strong>CRISPR</strong> transcripts (Carte et al., 2008)<br />
and a bacterium (Brouns et al., 2008) transcripts were detected from both DNA<br />
strands of each of <strong>the</strong> five <strong>CRISPR</strong> loci of S. acidocaldarius (Lillestøl et al., 2006;;<br />
Lillestøl et al., 2009). The largest <strong>CRISPR</strong> locus Saci-133 was probed aga<strong>in</strong>st spacer<br />
sequences distributed along <strong>the</strong> cluster and each yielded clear signals <strong>in</strong> Nor<strong>the</strong>rn<br />
analyses. The smallest processed products <strong>in</strong> <strong>the</strong> size range 55–60 nt were larger<br />
than those of leader strand crRNAs and were less regularly processed. These small<br />
RNAs were observed for all five S. acidocaldarius repeat-clusters and must conta<strong>in</strong><br />
most or all of <strong>the</strong> spacer sequence because <strong>the</strong> correspond<strong>in</strong>g band was not detected<br />
when <strong>the</strong> spacer probe was replaced by a repeat probe. Whe<strong>the</strong>r <strong>the</strong>se have a role <strong>in</strong><br />
protect<strong>in</strong>g <strong>the</strong> mature crRNAs when <strong>the</strong>re are no <strong>in</strong>vad<strong>in</strong>g elements present rema<strong>in</strong>s<br />
unclear.<br />
8 Anti <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Systems<br />
Examples have been recorded of archaeal <strong>CRISPR</strong>/Cas modules be<strong>in</strong>g lost from<br />
genomes. For example, a variant stra<strong>in</strong> of S. solfataricus P2 (stra<strong>in</strong> P2A) was characterised<br />
that had lost four closely l<strong>in</strong>ked <strong>CRISPR</strong>/Cas modules, A to D, apparently<br />
via a s<strong>in</strong>gle recomb<strong>in</strong>ation event between border<strong>in</strong>g IS elements (Redder and Garrett,<br />
2006). Border<strong>in</strong>g IS elements also have <strong>the</strong> potential to generate transposons<br />
carry<strong>in</strong>g whole <strong>CRISPR</strong>/Cas or Cmr modules. Possibly this loss reflects S. solfataricus<br />
P2A be<strong>in</strong>g a laboratory stra<strong>in</strong> where <strong>the</strong> <strong>immune</strong> <strong>system</strong> had become an unnecessary<br />
burden on <strong>the</strong> cell’s energy resources <strong>in</strong> <strong>the</strong> absence of <strong>in</strong>vad<strong>in</strong>g genetic ele-
66<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
ments and this may be analogous to <strong>the</strong> many bacterial endosymbionts which lack<br />
functional <strong>CRISPR</strong>/Cas <strong>system</strong>s (Grissa et al., 2008; Mojica et al., 2009).<br />
There are also examples of viruses, which circumvent or <strong>in</strong>terfere with <strong>the</strong><br />
<strong>CRISPR</strong> <strong>system</strong>s. Some members of <strong>the</strong> viral families Rudiviridae and Lipothrixviridae,<br />
carry 12 bp <strong>in</strong>dels, probably deletions, <strong>in</strong> <strong>the</strong>ir genomes often ly<strong>in</strong>g with<strong>in</strong>,<br />
but not disrupt<strong>in</strong>g, open read<strong>in</strong>g frames (Peng et al., 2004; Vestergaard et al., 2008).<br />
Although <strong>the</strong> function of <strong>the</strong>se elements is unknown <strong>the</strong>y may be generated <strong>in</strong><br />
response to <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s to avoid crRNA target<strong>in</strong>g. The<br />
presence of multiple recomb<strong>in</strong>ation sites <strong>in</strong> some archaeal viruses and conjugative<br />
plasmids may also facilitate genomic rearrangements and sequence changes (Greve<br />
et al., 2004; Garrett et al., 2010a).<br />
Analysis of <strong>the</strong> genome of S. islandicus stra<strong>in</strong> M.16.4 isolated <strong>in</strong> Kamchatka,<br />
Russia (Reno et al., 2009), revealed <strong>the</strong> presence of a more direct viral <strong>in</strong>terference<br />
where an M164 provirus 1, has <strong>in</strong>tegrated <strong>in</strong>to, and disrupted, <strong>the</strong> csa3 gene encod<strong>in</strong>g<br />
a putative transcriptional regulator of <strong>the</strong> group 1 cas genes (Figure 8). The<br />
<strong>in</strong>sertion event seems to be recent s<strong>in</strong>ce <strong>the</strong> truncated parts of <strong>the</strong> csa3 gene show<br />
high sequence similarity to genes of closely related species, and it may be reversible.<br />
The closely related stra<strong>in</strong> M.16.27 carries and <strong>in</strong>tact csa3 gene (Figure 8A) but<br />
also, unlike stra<strong>in</strong> M.16.4, carries a <strong>CRISPR</strong> spacer sequence perfectly match<strong>in</strong>g<br />
<strong>the</strong> provirus.<br />
a M1627<br />
b<br />
c<br />
M164<br />
attL<br />
attR<br />
csa1 cas1 cas2 cas4 csa3<br />
csa1 cas1 cas2 cas4<br />
GTAAATTTTCTTCTGCACAGAAAGAAGAT----------AATCTT<br />
CGAAA----CTTCTGCACAGAAAGAGTATTTGACGTCAAAACATT<br />
*** **************** ** **<br />
sugar b<strong>in</strong>d<strong>in</strong>g<br />
<strong>in</strong>tegrase<br />
phospholipase D<br />
M164 provirus I<br />
13,908 bp<br />
DNA primase/polymerase<br />
csa provirus 3<br />
attL attR<br />
Fig. 8. An example of a cas gene<br />
cassette that has been <strong>in</strong>activated<br />
<strong>in</strong> <strong>the</strong> gene for <strong>the</strong> putative transcriptional<br />
regulator csa3 of S.<br />
islandicus M.16.4 by <strong>the</strong> <strong>in</strong>tegration<br />
of an M164 provirus 1.<br />
a) Stra<strong>in</strong> M.16.27, lack<strong>in</strong>g <strong>the</strong><br />
<strong>in</strong>tegrated provirus, carries a<br />
<strong>CRISPR</strong> spacer with a perfect<br />
match to <strong>the</strong> provirus whereas S.<br />
islandicus M.16.4, carry<strong>in</strong>g <strong>the</strong><br />
<strong>in</strong>tegrated provirus, conta<strong>in</strong>s no<br />
spacer sequence match<strong>in</strong>g <strong>the</strong><br />
proviral sequence.<br />
b) The <strong>in</strong>tegration att site <strong>in</strong> <strong>the</strong><br />
csa3 gene.<br />
c) Gene map of <strong>the</strong> <strong>in</strong>tegrated<br />
provirus show<strong>in</strong>g some predicted<br />
functional assignments
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 67<br />
9 Evolutionary Considerations<br />
The view that archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s are closely related<br />
has prevailed s<strong>in</strong>ce <strong>the</strong>ir discovery and was underp<strong>in</strong>ned by <strong>the</strong> similar order<strong>in</strong>g<br />
of spacer-repeat units <strong>in</strong> <strong>the</strong> <strong>CRISPR</strong> loci and by extensive sequence similarities<br />
between Cas prote<strong>in</strong>s (Haft et al., 2005; Godde and Bickerton, 2006; Makarova<br />
et al., 2006). This view has been re<strong>in</strong>forced by <strong>the</strong> shared mechanism of elongation<br />
of <strong>CRISPR</strong> loci at <strong>the</strong> leader-repeat junction as well as similarities <strong>in</strong> <strong>the</strong> process<strong>in</strong>g<br />
mechanisms of crRNAs <strong>in</strong> both Doma<strong>in</strong>s (Tang et al., 2002; Tang et al., 2005;<br />
Brouns et al., 2008; Hale et al., 2008; Hale et al., 2009). Never<strong>the</strong>less, <strong>the</strong>re are<br />
dist<strong>in</strong>ctive features. <strong>CRISPR</strong>/Cas modules are more common amongst archaea and<br />
tend to be larger, structurally more complex and more labile (Lillestøl et al., 2006;<br />
Grissa et al., 2008; Shah and Garrett, 2010). Many repeat sequences show a bias to<br />
archaea or bacteria <strong>CRISPR</strong> loci, and many archaeal repeats lack <strong>in</strong>verted repeats<br />
common to those of bacteria suggest<strong>in</strong>g that different RNA process<strong>in</strong>g signals occur<br />
with<strong>in</strong> transcript repeats (Lillestøl et al., 2006; Kun<strong>in</strong> et al., 2007). Moreover, many<br />
crenarchaea encode <strong>the</strong> <strong>CRISPR</strong> repeat b<strong>in</strong>d<strong>in</strong>g prote<strong>in</strong> of elusive function (Peng<br />
et al., 2003).<br />
Phylogenetic analyses imply that periodic <strong>in</strong>ter-Doma<strong>in</strong> exchange of <strong>CRISPR</strong>/<br />
Cas modules has occurred (Haft et al., 2005; Godde and Bickerton, 2006; Makarova<br />
et al., 2006). Clearly, cross<strong>in</strong>g Doma<strong>in</strong> boundaries would be a very complex process<br />
given <strong>the</strong> basic differences <strong>in</strong> <strong>the</strong> transcriptional and translational mechanisms of<br />
archaea and bacteria (Torar<strong>in</strong>sson et al., 2005; Santangelo et al., 2009). Moreover,<br />
conjugal DNA transfer would also have to overcome <strong>the</strong> major barriers of different<br />
membrane and cell wall structures, and different conjugative <strong>system</strong>s, of archaea<br />
and bacteria (Greve et al., 2004; Veith et al., 2009). Never<strong>the</strong>less, coevolution<br />
of archaeal and bacterial <strong>CRISPR</strong>/Cas <strong>system</strong>s would only require cross doma<strong>in</strong><br />
events to succeed rarely. The more archaea-specific components may be associated<br />
with <strong>system</strong>s that have evolved <strong>in</strong> environments of high temperature, extremes of<br />
pH, or hypersal<strong>in</strong>e conditions where levels of bacteria are relatively low, which is<br />
also supported by <strong>the</strong> cas gene compositions of different <strong>CRISPR</strong>/Cas families.<br />
O<strong>the</strong>r mechanistic differences may surface as <strong>the</strong> different <strong>system</strong>s are studied <strong>in</strong><br />
more depth. Importantly, however, crenarchaeal viruses have radically different<br />
virus-host relationships from those of bacteria that may require altered responses<br />
from <strong>the</strong> <strong>immune</strong> <strong>system</strong>s (Prangishvili et al., 2006a; Bize et al., 2009) and it is<br />
likely that <strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s have ma<strong>in</strong>ta<strong>in</strong>ed and/or undergone<br />
Doma<strong>in</strong>-specific adaptations dur<strong>in</strong>g evolution.<br />
Small <strong>in</strong>terference RNA <strong>system</strong>s (siRNA) are widespread <strong>in</strong> eukarya where <strong>the</strong>y<br />
have multiple roles <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> discrim<strong>in</strong>ation and target<strong>in</strong>g of “foreign” genetic<br />
elements <strong>in</strong>clud<strong>in</strong>g viruses and transposons (Hannon 2002; J<strong>in</strong>ek and Doudna,<br />
2009). There are broad mechanistic parallels between <strong>the</strong>se eukaryal siRNA <strong>system</strong>s<br />
and <strong>the</strong> DNA- and RNA-target<strong>in</strong>g <strong>CRISPR</strong> <strong>system</strong>s. They all have to dist<strong>in</strong>guish<br />
foreign DNA from self-DNA, and target nucleic acids which show little<br />
sequence similarity and can undergo cont<strong>in</strong>ual sequence change. However, whereas<br />
<strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>s employ ssRNAs for target<strong>in</strong>g foreign elements, <strong>the</strong> eukaryal
68<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
anti-viral <strong>system</strong>s generate small 21–22 bp dsRNAs for target<strong>in</strong>g viruses which are<br />
subsequently converted to ssRNAs by an Argonaute prote<strong>in</strong>-RISC complex.<br />
The closest parallel to <strong>the</strong> crRNAs and <strong>CRISPR</strong> loci amongst <strong>the</strong> eukaryal siRNA<br />
<strong>system</strong>s are <strong>the</strong> Argonaute Piwi-<strong>in</strong>teract<strong>in</strong>g RNAs (piRNAs) directly processed<br />
from large transcripts of piRNA clusters which are rich <strong>in</strong> transposons and repeatsequence<br />
elements and, as for <strong>the</strong> <strong>CRISPR</strong> loci, occur at specific chromosomal sites<br />
(Lillestøl et al., 2009; Karg<strong>in</strong>ov and Hannon, 2010). This eukaryal <strong>system</strong> probably<br />
plays a role <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g germl<strong>in</strong>e <strong>in</strong>tegrity and development (Arav<strong>in</strong> et al., 2007;<br />
Klattenhoff and Theurkauf, 2008). As for <strong>CRISPR</strong> loci, <strong>the</strong> piRNA clusters <strong>in</strong>crease<br />
<strong>the</strong>ir <strong>in</strong>formational capacity by <strong>the</strong> <strong>in</strong>sertion of transposon sequences which provide<br />
novel sequence content and are ma<strong>in</strong>ta<strong>in</strong>ed <strong>in</strong> <strong>the</strong> piRNA clusters by selection.<br />
Thus, cont<strong>in</strong>ual expansion of piRNA clusters occurs, as for <strong>CRISPR</strong> loci, but <strong>the</strong><br />
process is passive ra<strong>the</strong>r than directed. Moreover, as for <strong>the</strong> <strong>CRISPR</strong>/Cas <strong>system</strong>,<br />
<strong>the</strong> newly <strong>in</strong>corporated DNA derives exclusively from genetic elements that are to<br />
be targeted. No homologous prote<strong>in</strong>s have been detected from sequence analyses<br />
between prote<strong>in</strong>s of <strong>the</strong> eukaryal siRNA <strong>system</strong>s and those of <strong>the</strong> <strong>CRISPR</strong> <strong>system</strong>,<br />
although similarities may appear at a tertiary structural level.<br />
10 Conclusions<br />
The <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>immune</strong> mach<strong>in</strong>ery provide an effective<br />
defence aga<strong>in</strong>st foreign genetic elements <strong>in</strong> archaea and some bacteria. The <strong>system</strong><br />
is dynamic and hereditable, although <strong>the</strong> benefit for <strong>the</strong> cell <strong>in</strong> evolutionary terms is<br />
transitional because DNA from extra chromosomal elements taken up as spacers <strong>in</strong><br />
<strong>CRISPR</strong> loci, have a rapid turnover and are lost aga<strong>in</strong> via recomb<strong>in</strong>ation at repeats<br />
and/or transpositional events. Current evidence suggests that <strong>CRISPR</strong>/Cas and Cmr<br />
modules can behave like <strong>in</strong>tegral genetic elements. They tend to be located <strong>in</strong> <strong>the</strong><br />
most variable regions of chromosomes, sometimes physically l<strong>in</strong>ked, and are frequently<br />
displaced as a result of genome shuffl<strong>in</strong>g, <strong>in</strong>clud<strong>in</strong>g possibly transposition<br />
of whole modules. <strong>CRISPR</strong> loci may be broken up, and dispersed, <strong>in</strong> chromosomes<br />
with <strong>the</strong> potential for creat<strong>in</strong>g genetic novelty. Small leaderless <strong>CRISPR</strong>-like loci<br />
are commonly found <strong>in</strong> chromosomes, and <strong>in</strong> plasmids, and some can be transcribed<br />
and processed and <strong>the</strong>refore constitute potentially functional accessories to<br />
<strong>the</strong> <strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s. Both <strong>CRISPR</strong>/Cas and Cmr modules appear<br />
to exchange readily between closely related organisms, possibly via chromosomal<br />
conjugation, where <strong>the</strong>y may be subjected to strong selective pressure. While universal<br />
phylogenetic trees based on <strong>the</strong> Cas1 and Cmr2 prote<strong>in</strong>s of <strong>the</strong> <strong>CRISPR</strong>/<br />
Cas and CMR modules, respectively, suggest that transfers between archaea and<br />
bacteria have occurred, <strong>the</strong> relatively large number of archaea-specific Cas/Cmr<br />
prote<strong>in</strong>s suggests that <strong>the</strong>se may have been very rare events, consistent with <strong>the</strong><br />
<strong>in</strong>compatibility of <strong>the</strong> transcriptional, translational and conjugative <strong>system</strong>s of <strong>the</strong><br />
two Doma<strong>in</strong>s (Shah and Garrett, 2010). Parallels to <strong>the</strong> eukaryal siRNAs exist, and<br />
especially germ cell piRNAs which are also directed by effector prote<strong>in</strong>s to silence<br />
or destroy <strong>in</strong>vad<strong>in</strong>g foreign DNA and transposons.
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 69<br />
References<br />
Arav<strong>in</strong> AA, Hannon GJ, Brennecke J (2007) The Piwi-piRNA pathway provides an adaptive<br />
defense <strong>in</strong> <strong>the</strong> transposon arms race. Science 318: 761–764<br />
Arnold HP, She Q, Phan H, Stedman K, Prangishvili D, Holz I et al. (1999) The genetic element<br />
pSSVx of <strong>the</strong> extremely <strong>the</strong>rmophilic crenarchaeon Sulfolobus is a hybrid between a plasmid<br />
and a virus. Mol Microbiol 34: 217–226<br />
Barrangou R, Fremaux C, Deveau H, Richards M, Boyaval P, Mo<strong>in</strong>eau S et al. (2007) <strong>CRISPR</strong><br />
provides acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science 315: 1709–1712<br />
Basta T, Smyth J, Forterre P, Prangishvili D, Peng X (2009) Novel archaeal plasmid pAH1 and its<br />
<strong>in</strong>teractions with <strong>the</strong> lipothrixvirus AFV1. Mol Microbiol 71: 23–34<br />
Bettstetter M, Peng X, Garrett RA, Prangishvili D (2003) AFV1, a novel virus <strong>in</strong>fect<strong>in</strong>g hyper<strong>the</strong>rmophilic<br />
archaea of <strong>the</strong> genus Acidianus. Virology 315: 68–79<br />
Bize A, Karlsson EA, Ekefjard K, Quax TE, P<strong>in</strong>a M, Prevost MC et al. (2009) A unique virus<br />
release mechanism <strong>in</strong> <strong>the</strong> <strong>Archaea</strong>. Proc Natl Acad Sci U S A 106: 11306–11311<br />
Bize A, Peng X, Prokofeva M, Maclellan K, Lucas S, Forterre P et al. (2008) Viruses <strong>in</strong> acidic<br />
geo<strong>the</strong>rmal environments of <strong>the</strong> Kamchatka Pen<strong>in</strong>sula. Res Microbiol 159: 358–366<br />
Bolot<strong>in</strong> A, Qu<strong>in</strong>quis B, Sorok<strong>in</strong> A, Ehrlich SD (2005) Clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome<br />
repeats (<strong>CRISPR</strong>s) have spacers of extrachromosomal orig<strong>in</strong>. Microbiology 151:<br />
2551–2561<br />
Brouns SJ, Jore MM, Lundgren M, Westra ER, Slijkhuis RJ, Snijders AP et al. (2008) Small<br />
<strong>CRISPR</strong> RNAs guide antiviral defense <strong>in</strong> prokaryotes. Science 321: 960–964<br />
Brügger K, Torar<strong>in</strong>sson E, Redder P, Chen L, Garrett RA (2004) Shuffl<strong>in</strong>g of Sulfolobus genomes<br />
by autonomous and non-autonomous mobile elements. Biochem Soc Trans 32: 179–183<br />
Carte J, Wang R, Li H, Terns RM, Terns MP (2008) Cas6 is an endoribonuclease that generates<br />
guide RNAs for <strong>in</strong>vader defense <strong>in</strong> prokaryotes. Genes Dev 22: 3489–3496<br />
Chen L, Brugger K, Skovgaard M, Redder P, She Q, Torar<strong>in</strong>sson E et al. (2005) The genome of<br />
Sulfolobus acidocaldarius, a model organism of <strong>the</strong> Crenarchaeota. J Bacteriol 187: 4992–<br />
4999<br />
Cortez D, Forterre P, Gribaldo S (2009) A hidden reservoir of <strong>in</strong>tegrative elements is <strong>the</strong> major<br />
source of recently acquired foreign genes and ORFans <strong>in</strong> archaeal and bacterial genomes.<br />
Genome Biol 10: R65<br />
Garrett RA, Prangishvili D, Shah SA, Reuter M, Stetter KO, Peng X (2010a) Metagenomic analyses<br />
of novel viruses and plasmids from a cultured environmental sample of hyper<strong>the</strong>rmophilic<br />
neutrophiles. Environ Microbiol doi: 10.1111/j.1462–2920.2010.02266.x<br />
Garrett RA, Shah SA, Vestergaard G, Deng L, Gudbergsdottir S, Kenchappa CS et al. (2010b)<br />
<strong>CRISPR</strong>-based <strong>immune</strong> <strong>system</strong>s of <strong>the</strong> Sulfolobales – complexity and diversity. Biochem<br />
Soc Trans <strong>in</strong> press<br />
Godde JS and Bickerton A (2006) The repetitive DNA elements called <strong>CRISPR</strong>s and <strong>the</strong>ir associated<br />
genes: evidence of horizontal transfer among prokaryotes. J Mol Evol 62: 718–729<br />
Greve B, Jensen S, Brugger K, Zillig W, Garrett RA (2004) Genomic comparison of archaeal conjugative<br />
plasmids from Sulfolobus. <strong>Archaea</strong> 1: 231–239<br />
Grissa I, Vergnaud G, Pourcel C (2008) <strong>CRISPR</strong>compar: a website to compare clustered regularly<br />
<strong>in</strong>terspaced short pal<strong>in</strong>dromic repeats. Nucleic Acids Res 36: W145-W148<br />
Gudbergsdottir S, Deng L, Chen Z, Jensen JVK, Jensen LR, She Q et al. (2010) Dynamic properties<br />
of <strong>the</strong> Sulfolobus <strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr <strong>system</strong>s when challenged with vectorborne<br />
viral and plasmid genes and protospacers. Mol Microbiol under review<br />
Guo L, Brugger K, Chao Liu C, Shah SA, Zheng H, Zhu Y et al. (2010) Comparative genomics of<br />
two stra<strong>in</strong>s of Sulfolobus islandicus from Iceland: Hosts for study<strong>in</strong>g crenarchaeal genetics<br />
and virus life cycles. submitted<br />
Haft DH, Selengut J, Mongod<strong>in</strong> EF, Nelson KE (2005) A guild of 45 <strong>CRISPR</strong>-associated (Cas)<br />
prote<strong>in</strong> families and multiple <strong>CRISPR</strong>/Cas subtypes exist <strong>in</strong> prokaryotic genomes. PLoS<br />
Comput Biol 1: e60
70<br />
<strong>CRISPR</strong>/Cas and <strong>CRISPR</strong>/Cmr Immune Systems of <strong>Archaea</strong><br />
Hale C, Kleppe K, Terns RM, Terns MP (2008) Prokaryotic silenc<strong>in</strong>g (psi)RNAs <strong>in</strong> Pyrococcus<br />
furiosus. RNA 14: 2572–2579<br />
Hale CR, Zhao P, Olson S, Duff MO, Graveley BR, Wells L et al. (2009) RNA-guided RNA cleavage<br />
by a <strong>CRISPR</strong> RNA-Cas prote<strong>in</strong> complex. Cell 139: 945–956<br />
Hannon GJ (2002) RNA <strong>in</strong>terference. Nature 418: 244–251<br />
Held NL and Whitaker RJ (2009) Viral biogeography revealed by signatures <strong>in</strong> Sulfolobus islandicus<br />
genomes. Environ Microbiol 11: 457–466<br />
Horvath P and Barrangou R (2010) <strong>CRISPR</strong>/Cas, <strong>the</strong> <strong>immune</strong> <strong>system</strong> of bacteria and archaea.<br />
Science 327: 167–170<br />
Jansen R, Embden JD, Gaastra W, Schouls LM (2002) Identification of genes that are associated<br />
with DNA repeats <strong>in</strong> prokaryotes. Mol Microbiol 43: 1565–1575<br />
J<strong>in</strong>ek M and Doudna JA (2009) A three-dimensional view of <strong>the</strong> molecular mach<strong>in</strong>ery of RNA<br />
<strong>in</strong>terference. Nature 457: 405–412<br />
Karg<strong>in</strong>ov FV and Hannon GJ (2010) The <strong>CRISPR</strong> <strong>system</strong>: small RNA-guided defense <strong>in</strong> bacteria<br />
and archaea. Mol Cell 37: 7–19<br />
Klattenhoff C and Theurkauf W (2008) Biogenesis and germl<strong>in</strong>e functions of piRNAs. Development<br />
135: 3–9<br />
Krupovic M, Forterre P, Bamford DH (2010) Comparative analysis of <strong>the</strong> mosaic genomes of<br />
tailed archaeal viruses and proviruses suggests common <strong>the</strong>mes for virion architecture and<br />
assembly with tailed viruses of bacteria. J Mol Biol 397: 144–160<br />
Kun<strong>in</strong> V, Sorek R, Hugenholtz P (2007) Evolutionary conservation of sequence and secondary<br />
structures <strong>in</strong> <strong>CRISPR</strong> repeats. Genome Biol 8: R61<br />
Lawrence CM, Menon S, Eilers BJ, Bothner B, Khayat R, Douglas T et al. (2009) Structural and<br />
functional studies of archaeal viruses. J Biol Chem 284: 12599–12603<br />
Lillestøl RK, Redder P, Garrett RA, Brugger K (2006) A putative viral defence mechanism <strong>in</strong><br />
archaeal cells. <strong>Archaea</strong> 2: 59–72<br />
Lillestøl RK, Shah SA, Brugger K, Redder P, Phan H, Christiansen J et al. (2009) <strong>CRISPR</strong> families<br />
of <strong>the</strong> crenarchaeal genus Sulfolobus: bidirectional transcription and dynamic properties.<br />
Mol Microbiol 72: 259–272<br />
Makarova KS, Grish<strong>in</strong> NV, Shabal<strong>in</strong>a SA, Wolf YI, Koon<strong>in</strong> EV (2006) A putative RNA-<strong>in</strong>terference-based<br />
<strong>immune</strong> <strong>system</strong> <strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted enzymatic<br />
mach<strong>in</strong>ery, functional analogies with eukaryotic RNAi, and hypo<strong>the</strong>tical mechanisms<br />
of action. Biol Direct 1: 7<br />
Marraff<strong>in</strong>i LA and Son<strong>the</strong>imer EJ (2008) <strong>CRISPR</strong> <strong>in</strong>terference limits horizontal gene transfer <strong>in</strong><br />
staphylococci by target<strong>in</strong>g DNA. Science 322: 1843–1845<br />
Marraff<strong>in</strong>i LA and Son<strong>the</strong>imer EJ (2010) Self versus non-self discrim<strong>in</strong>ation dur<strong>in</strong>g <strong>CRISPR</strong><br />
RNA-directed immunity. Nature 463: 568–571<br />
Mojica FJ, Diez-Villasenor C, Garcia-Mart<strong>in</strong>ez J, Almendros C (2009) Short motif sequences<br />
determ<strong>in</strong>e <strong>the</strong> targets of <strong>the</strong> prokaryotic <strong>CRISPR</strong> defence <strong>system</strong>. Microbiology 155: 733–<br />
740<br />
Mojica FJ, ez-Villasenor C, Garcia-Mart<strong>in</strong>ez J, Soria E (2005) Interven<strong>in</strong>g sequences of regularly<br />
spaced prokaryotic repeats derive from foreign genetic elements. J Mol Evol 60: 174–182<br />
Peng X, Blum H, She Q, Mallok S, Brugger K, Garrett RA et al. (2001) Sequences and replication<br />
of genomes of <strong>the</strong> archaeal rudiviruses SIRV1 and SIRV2: relationships to <strong>the</strong> archaeal<br />
lipothrixvirus SIFV and some eukaryal viruses. Virology 291: 226–234<br />
Peng X, Brugger K, Shen B, Chen L, She Q, Garrett RA (2003) Genus-specific prote<strong>in</strong> b<strong>in</strong>d<strong>in</strong>g<br />
to <strong>the</strong> large clusters of DNA repeats (short regularly spaced repeats) present <strong>in</strong> Sulfolobus<br />
genomes. J Bacteriol 185: 2410–2417<br />
Peng X, Kessler A, Phan H, Garrett RA, Prangishvili D (2004) Multiple variants of <strong>the</strong> archaeal<br />
DNA rudivirus SIRV1 <strong>in</strong> a s<strong>in</strong>gle host and a novel mechanism of genomic variation. Mol<br />
Microbiol 54: 366–375<br />
Porter K, Russ BE, Dyall-Smith ML (2007) Virus-host <strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol<br />
10: 418–424<br />
Portillo MC and Gonzalez JM (2009) <strong>CRISPR</strong> elements <strong>in</strong> <strong>the</strong> Thermococcales: evidence for<br />
associated horizontal gene transfer <strong>in</strong> Pyrococcus furiosus. J Appl Genet 50: 421–430
Shiraz A. Shah, Gisle Vestergaard and Roger A. Garrett 71<br />
Pourcel C, Salvignol G, Vergnaud G (2005) <strong>CRISPR</strong> elements <strong>in</strong> Yers<strong>in</strong>ia pestis acquire new<br />
repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary<br />
studies. Microbiology 151: 653–663<br />
Prangishvili D, Forterre P, Garrett RA (2006a) Viruses of <strong>the</strong> <strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev<br />
Microbiol 4: 837–848<br />
Prangishvili D, Garrett RA, Koon<strong>in</strong> EV (2006b) Evolutionary genomics of archaeal viruses:<br />
unique viral genomes <strong>in</strong> <strong>the</strong> third doma<strong>in</strong> of life. Virus Res 117: 52–67<br />
Rachel R, Bettstetter M, Hedlund BP, Har<strong>in</strong>g M, Kessler A, Stetter KO et al. (2002) Remarkable<br />
morphological diversity of viruses and virus-like particles <strong>in</strong> hot terrestrial environments.<br />
Arch Virol 147: 2419–2429<br />
Redder P and Garrett RA (2006) Mutations and rearrangements <strong>in</strong> <strong>the</strong> genome of Sulfolobus solfataricus<br />
P2. J Bacteriol 188: 4198–4206<br />
Reno ML, Held NL, Fields CJ, Burke PV, Whitaker RJ (2009) Biogeography of <strong>the</strong> Sulfolobus<br />
islandicus pan-genome. Proc Natl Acad Sci U S A 106: 8605–8610<br />
Santangelo TJ, Cubonova L, Sk<strong>in</strong>ner KM, Reeve JN (2009) <strong>Archaea</strong>l <strong>in</strong>tr<strong>in</strong>sic transcription term<strong>in</strong>ation<br />
<strong>in</strong> vivo. J Bacteriol 191: 7102–7108<br />
Shah SA and Garrett RA (2010) <strong>CRISPR</strong>/Cas and Cmr modules, mobility and evolution of adaptive<br />
<strong>immune</strong> <strong>system</strong>s. Res Microbiol<br />
Shah SA, Hansen NR, Garrett RA (2009) Distribution of <strong>CRISPR</strong> spacer matches <strong>in</strong> viruses and<br />
plasmids of crenarchaeal acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory mechanism.<br />
Biochem Soc Trans 37: 23–28<br />
She Q, Peng X, Zillig W, Garrett RA (2001) Gene capture <strong>in</strong> archaeal chromosomes. Nature 409:<br />
478<br />
She Q, Phan H, Garrett RA, Albers SV, Stedman KM, Zillig W (1998) Genetic profile of pNOB8<br />
from Sulfolobus: <strong>the</strong> first conjugative plasmid from an archaeon. Extremophiles 2: 417–425<br />
Tang TH, Bachellerie JP, Rozhdestvensky T, Bortol<strong>in</strong> ML, Huber H, Drungowski M et al. (2002)<br />
Identification of 86 candidates for small non-messenger RNAs from <strong>the</strong> archaeon Archaeoglobus<br />
fulgidus. Proc Natl Acad Sci U S A 99: 7536–7541<br />
Tang TH, Polacek N, Zywicki M, Huber H, Brugger K, Garrett R et al. (2005) Identification of<br />
novel non-cod<strong>in</strong>g RNAs as potential antisense regulators <strong>in</strong> <strong>the</strong> archaeon Sulfolobus solfataricus.<br />
Mol Microbiol 55: 469–481<br />
Torar<strong>in</strong>sson E, Klenk HP, Garrett RA (2005) Divergent transcriptional and translational signals <strong>in</strong><br />
<strong>Archaea</strong>. Environ Microbiol 7: 47–54<br />
Tyson GW and Banfield JF (2008) Rapidly evolv<strong>in</strong>g <strong>CRISPR</strong>s implicated <strong>in</strong> acquired resistance of<br />
microorganisms to viruses. Environ Microbiol 10: 200–207<br />
Veith A, Kl<strong>in</strong>gl A, Zolghadr B, Lauber K, Mentele R, Lottspeich F et al. (2009) Acidianus, Sulfolobus<br />
and Metallosphaera surface layers: structure, composition and gene expression. Mol<br />
Microbiol 73: 58–72<br />
Vestergaard G, Shah SA, Bize A, Reitberger W, Reuter M, Phan H et al. (2008) Stygiolobus rodshaped<br />
virus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal rudiviruses with <strong>the</strong> <strong>CRISPR</strong> antiviral <strong>system</strong>.<br />
J Bacteriol 190: 6837–6845<br />
Wang Y, Duan Z, Zhu H, Guo X, Wang Z, Zhou J et al. (2007) A novel Sulfolobus non-conjugative<br />
extrachromosomal genetic element capable of <strong>in</strong>tegration <strong>in</strong>to <strong>the</strong> host genome and spread<strong>in</strong>g<br />
<strong>in</strong> <strong>the</strong> presence of a fusellovirus. Virology 363: 124–133<br />
Zillig W, Arnold HP, Holz I, Prangishvili D, Schweier A, Stedman K et al. (1998) Genetic elements<br />
<strong>in</strong> <strong>the</strong> extremely <strong>the</strong>rmophilic archaeon Sulfolobus. Extremophiles 2: 131–140
Chapter 7<br />
<strong>Archaea</strong>l Type II TA Loci<br />
Shiraz A. Shah and Roger A. Garrett<br />
Correspond<strong>in</strong>g: garrett@bio.ku.dk<br />
<strong>Archaea</strong> Centre, Department of Biology, Ole Maaløes Vej 5, DK-2200<br />
Copenhagen N, Denmark<br />
Abstract A few of <strong>the</strong> bacterial type II TA <strong>system</strong>s, primarily those<br />
<strong>in</strong>volved <strong>in</strong> tranaslational <strong>in</strong>hibition, occur widely throughout <strong>the</strong><br />
archaeal doma<strong>in</strong>. Us<strong>in</strong>g a bio<strong>in</strong>formatic approach, <strong>the</strong> frequency and<br />
diastribution of <strong>the</strong>se diverse TA loci were exam<strong>in</strong>ed with<strong>in</strong><br />
completed genomes of 124 archaea, that are distributed fairly evenly<br />
throughout <strong>the</strong> major archaeal phyla. Results for <strong>the</strong> frequency and<br />
diversity of TA loci are summarised for archaea isolated from<br />
environmental niches generally characterised by extreme conditions<br />
<strong>in</strong>clud<strong>in</strong>g high temperature, high salt concentrations, high pressures,<br />
extremes of pH or strictly anaerobic conditions. No clear correlations<br />
were found between <strong>the</strong> number of TA loci present and ei<strong>the</strong>r <strong>the</strong><br />
genome size or particular environmental conditions. Multiple TA loci<br />
tend to be concentrated <strong>in</strong> variable genomic regions where <strong>the</strong><br />
occurrence of <strong>in</strong>tra- or <strong>in</strong>ter-genomic gene transfer is most prevalent.<br />
For members of <strong>the</strong> Sulfolobales which are uniformly rich <strong>in</strong> TA loci,<br />
a case is made for some TA <strong>system</strong>s facilitat<strong>in</strong>g ma<strong>in</strong>tenance of<br />
important genomic regions.<br />
7.1. Introduction<br />
Until recently, type II TA <strong>system</strong>s have received relatively little<br />
attention <strong>in</strong> comparative genomic studies of archaea. This reflects a<br />
general uncerta<strong>in</strong>ty regard<strong>in</strong>g <strong>the</strong>ir functions, <strong>the</strong> significance of<br />
<strong>the</strong>ir structural diversity and, to some degree, <strong>the</strong>ir identities.<br />
Moreover, this uncerta<strong>in</strong>ty was compounded by <strong>the</strong> small gene sizes,<br />
especially for <strong>the</strong> antitox<strong>in</strong>s, which rendered <strong>the</strong>ir annotation<br />
difficult. This widespread deficiency was first highlighted by Gerdes'<br />
1
group who identified large numbers of non annotated TA loci <strong>in</strong><br />
archaeal and bacterial genomes and demonstrated <strong>the</strong> structural<br />
diversity of <strong>the</strong> prote<strong>in</strong> components with<strong>in</strong> different TA families<br />
(Pandey et al., 2005; Gerdes et al., 2005; Jørgensen et al., 2009). This<br />
development, comb<strong>in</strong>ed with contemporary <strong>in</strong>sights ga<strong>in</strong>ed <strong>in</strong>to<br />
molecular mechanisms of tox<strong>in</strong> <strong>in</strong>hibitory activity (reviewed <strong>in</strong><br />
Gerdes et al., 2005), served to focus attention on <strong>the</strong> profound<br />
importance of TA <strong>system</strong>s for cellular viability and survival.<br />
Genome-based surveys of bacterial type II TA <strong>system</strong>s,<br />
carry<strong>in</strong>g two genes, have identified eight major families denoted<br />
vapBC, relBE, hicBA, mazEF, phd/doc, parDE, ccdAB and higBA with an<br />
additional <strong>system</strong> <strong>in</strong> a Streptococcus plasmid carry<strong>in</strong>g three genes (,<br />
and , a repressor, antitox<strong>in</strong> and tox<strong>in</strong>, respectively). VapC, RelE,<br />
MazE and HicA tox<strong>in</strong>s have all been demonstrated experimentally to<br />
<strong>in</strong>hibit translation and Doc has also been implicated, at least<br />
<strong>in</strong>directly, <strong>in</strong> affect<strong>in</strong>g translation. In contrast, ParE and CcdB target<br />
<strong>the</strong> bacterial DNA gyrase <strong>the</strong>reby block<strong>in</strong>g DNA replication<br />
(reviewed <strong>in</strong> Gerdes et al., 2005).<br />
Only three of <strong>the</strong>se tox<strong>in</strong> families VapC, RelE and HicA, each<br />
target<strong>in</strong>g translation, occur commonly amongst archaea and this<br />
chapter is ma<strong>in</strong>ly focussed on <strong>the</strong>se three TA <strong>system</strong>s. In <strong>the</strong><br />
bacterium Shigella flexneri VapC tox<strong>in</strong>s act by cleav<strong>in</strong>g <strong>in</strong>itiator tRNA<br />
with<strong>in</strong> <strong>the</strong> anticodon loop <strong>the</strong>reby <strong>in</strong>hibit<strong>in</strong>g translational <strong>in</strong>itation<br />
(Dienemann et al. 2011; W<strong>in</strong><strong>the</strong>r and Gerdes, 2011), while RelE b<strong>in</strong>ds<br />
at <strong>the</strong> ribosomal A-site cutt<strong>in</strong>g <strong>the</strong> bound mRNA with<strong>in</strong> <strong>the</strong> codon<br />
(Neubauer et al., 2009), and HicA is a translation-dependent mRNA<br />
transferase (Jørgensen et al., 2009; Makarova et al., 2009a). MazF and<br />
Doc have also been implicated <strong>in</strong> target<strong>in</strong>g translation, but <strong>the</strong>ir<br />
homologs are rarely found amongst archaea (Pandey and Gerdes,<br />
2005; Makarova et al., 2009b). Most archaea do not carry a homolog<br />
of <strong>the</strong> bacterial gyrase, <strong>the</strong> target of <strong>the</strong> ParE and CcdB tox<strong>in</strong>s,<br />
employ<strong>in</strong>g <strong>in</strong>stead <strong>the</strong> archaea-specific topoisomerase VI (Gadelle et<br />
al., 2003; Yamashiro and Yamagishi, 2005).<br />
An extensive genomic survey of bacterial and archaeal type II TA<br />
<strong>system</strong>s by Makarova et al., (2009b), that did not take <strong>in</strong>to account <strong>the</strong><br />
many non annotated genes, re<strong>in</strong>forced <strong>the</strong> considerable structural<br />
diversity of <strong>the</strong> major TA families and identified additional subtypes,<br />
especially of <strong>the</strong> antitox<strong>in</strong> components. This study also provided<br />
bio<strong>in</strong>formatical evidence for a possible additional TA locus encod<strong>in</strong>g<br />
MNT (M<strong>in</strong>imal Nucleotidyl Transferase) and HEPN (Higher<br />
Eukaryote and Prokaryote Nucleotide b<strong>in</strong>d<strong>in</strong>g). Although <strong>the</strong>re is<br />
currently no experimental support for any tox<strong>in</strong> activity (Makarova<br />
et al., 2009b), we never<strong>the</strong>less <strong>in</strong>cluded MNT/HEPN gene pairs <strong>in</strong> <strong>the</strong><br />
2
present analysis because <strong>the</strong>y occur commonly <strong>in</strong> archaea, especially<br />
amongst <strong>the</strong> hyper<strong>the</strong>rmophiles and, moreover, <strong>the</strong>ir frequency of<br />
genome occurrence mirrors partially that of vapBC gene pairs.<br />
A bio<strong>in</strong>formatical approach was employed to identify archaeal<br />
type II TA loci with<strong>in</strong> 124 completed archaeal genomes. Exhaustive<br />
searches were made for <strong>the</strong> major families of TA gene loci, vapBC,<br />
relBE, and hicAB and for <strong>the</strong> HEPN/MNT gene pairs and attempts<br />
were made to identify non annotated antitox<strong>in</strong> and tox<strong>in</strong> genes.<br />
7.2. The archaeal perspective<br />
<strong>Archaea</strong> differ from bacteria <strong>in</strong> <strong>the</strong>ir cellular biology <strong>in</strong><br />
fundamental ways and <strong>the</strong>y share many cellular processes<br />
exclusively with eukaryotes albeit generally <strong>in</strong> less complex forms.<br />
Although <strong>the</strong> evolutionary history of archaea and <strong>the</strong>ir relationship<br />
to early eukarya rema<strong>in</strong>s enigmatic (Gribaldi et al., 2010; Kurland,<br />
2006), <strong>the</strong> ma<strong>in</strong>tenance of unique cellular properties amongst archaea<br />
is likely to be due to <strong>the</strong>ir successful adaptation to extreme<br />
environmental conditions. These <strong>in</strong>clude high temperature, extremes<br />
of pH, high salt, high pressures and strictly anaerobic conditions;<br />
and such environments that also tend to be low <strong>in</strong> energy sources<br />
(Kletz<strong>in</strong>, 2007). It has been argued that some of <strong>the</strong> properties unique<br />
to archaea arose from adaptation to chronic energy stress through<br />
modify<strong>in</strong>g catabolic pathways and by conserv<strong>in</strong>g energy via <strong>the</strong>ir<br />
low permeability e<strong>the</strong>r-l<strong>in</strong>ked lipid membranes (Valent<strong>in</strong>e, 2007).<br />
Thus, stress <strong>in</strong> bacteria and archaea cannot simply be equated when<br />
consider<strong>in</strong>g <strong>the</strong> modes of action of tox<strong>in</strong>s.<br />
TA <strong>system</strong>s that are shared between bacteria and archaea appear<br />
primarily to <strong>in</strong>hibit translation, cleav<strong>in</strong>g ei<strong>the</strong>r mRNA bound <strong>in</strong> <strong>the</strong><br />
ribosomal A-site (RelE), <strong>the</strong> anticodon of <strong>the</strong> <strong>in</strong>itiator tRNA (VapC)<br />
or mRNA directly (HicA). The ribosomal tRNA b<strong>in</strong>d<strong>in</strong>g sites,<br />
decod<strong>in</strong>g site and peptidyl transferase centre constitute <strong>the</strong> most<br />
conserved regions of <strong>the</strong> translational apparatus, <strong>in</strong> both bacteria and<br />
archaea (and also <strong>in</strong> eukarya), as judged by <strong>the</strong>ir shared sensitivities<br />
to a wide range of antibiotics which specifically target <strong>the</strong>se sites <strong>in</strong><br />
both Doma<strong>in</strong>s (e.g. Rodriguez-Fonseca et al., 1995). Experimental<br />
studies <strong>in</strong>dicate that bacterial TAs have alternative cellular targets,<br />
<strong>in</strong>clud<strong>in</strong>g <strong>the</strong> bacterial DNA gyrase, but it rema<strong>in</strong>s unknown<br />
whe<strong>the</strong>r <strong>the</strong>re are unidentified archaeal tox<strong>in</strong>s which b<strong>in</strong>d to<br />
archaea-specific cellular sites.<br />
7.3 A bio<strong>in</strong>formatical approach<br />
All archaeal genomes publicly available at <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g of 2012<br />
were screened for <strong>the</strong> presence of type TA loci of <strong>the</strong> superfamilies<br />
3
vapBC, relBE and hicAB, as well as gene pairs of <strong>the</strong> predicted<br />
HEPN/MNT TA locus, by first construct<strong>in</strong>g tox<strong>in</strong>-specific hidden<br />
markov models (HMMs), us<strong>in</strong>g <strong>the</strong> jackhmmer program (Eddy,<br />
2011), aga<strong>in</strong>st <strong>the</strong> genomes us<strong>in</strong>g known tox<strong>in</strong> genes as queries.<br />
Subsequently, all open read<strong>in</strong>g frames (ORFs) between 50 and 250<br />
aa that did not overlap previously annotated ORFs above 250 aa <strong>in</strong><br />
length were extracted and screened us<strong>in</strong>g <strong>the</strong> constructed HMMs.<br />
Every upstream or downstream ORF, depend<strong>in</strong>g on <strong>the</strong> TA family<br />
type, located with<strong>in</strong> a fixed distance of <strong>the</strong> match<strong>in</strong>g tox<strong>in</strong> ORF,<br />
was extracted and clustered accord<strong>in</strong>g to sequence similarity. Some<br />
of <strong>the</strong>se clusters were judged to comprise antitox<strong>in</strong> gene families<br />
based on manual <strong>in</strong>spection of <strong>the</strong>ir genomic contexts, and <strong>the</strong>y were<br />
paired with <strong>the</strong> correspond<strong>in</strong>g tox<strong>in</strong> genes to generate TA<br />
loci. Subsequently, we found that significant numbers of TA loci<br />
were partially overlapp<strong>in</strong>g with larger annotated ORFs, particularly<br />
for members of <strong>the</strong> Thermococcales and, <strong>the</strong>refore, we extended <strong>the</strong><br />
analyses to <strong>in</strong>clude <strong>the</strong>se genes, which <strong>in</strong>volved extensive manual<br />
<strong>in</strong>spection of <strong>the</strong> genomes.<br />
7.4. Phylogenetic distribution and frequency of archaeal TA loci<br />
A phylogenetic tree based on 16S rRNA sequences was<br />
generated for 124 archaea for which genome sequences were<br />
available. The genome size and natural habitat is given for each<br />
organism, and <strong>the</strong>rmophiles are dist<strong>in</strong>guished from mesophiles with<br />
a border for optimal growth of 50 o C (Table 1). More details of <strong>the</strong><br />
natural environments and optimal growth conditions for many of<br />
<strong>the</strong> organisms are given by Kletz<strong>in</strong> (2007). Some orders, <strong>in</strong>clud<strong>in</strong>g<br />
<strong>the</strong> hyper<strong>the</strong>rmophilic Sulfolobales, Thermoproteales and<br />
Thermococcales, are relatively overrepresented by closely related<br />
organisms <strong>in</strong>clud<strong>in</strong>g several Sulfolobus islandicus, Pyrobaculum and<br />
Thermococcus stra<strong>in</strong>s, while <strong>the</strong> less well characterised Korarchaea (K)<br />
and Thaumarchaea (T) are underrepresented. This bias primarily<br />
reflects that <strong>the</strong> former group are relatively easy to isolate and<br />
culture and that some of <strong>the</strong>m have been employed as model<br />
organisms for molecular, cellular and genetic studies. The total<br />
numbers of identified TA loci are given for vapBC, relBE and hicAB<br />
families and for <strong>the</strong> HEPN/MNT gene pairs <strong>in</strong> Table 1.<br />
The results reveal a wide range of type II TA contents. Several<br />
organisms carry 30 or more TA loci but many have very few or no<br />
detectable loci. vapBC constitute <strong>the</strong> dom<strong>in</strong>ant TA family and <strong>the</strong>y<br />
are most prevalent amongst <strong>the</strong>rmophiles, <strong>in</strong> particular <strong>in</strong> members<br />
of <strong>the</strong> <strong>the</strong>rmoacidophilic Sulfolobales (Pandey and Gerdes, 2005; Guo<br />
et al., 2011) and <strong>in</strong> some Thermococcus species. In contrast relBE or<br />
4
hicAB gene pairs are quite rare especially amongst <strong>the</strong> 40<br />
crenarchaeal genomes. For <strong>the</strong> euryarchaea relBE gene pairs were<br />
observed <strong>in</strong> about half of <strong>the</strong> genomes and several of <strong>the</strong>se carried 1<br />
to 9 copies. Similarly, hicAB pairs were identified <strong>in</strong> about half <strong>the</strong><br />
euryarchaeal genomes with multiple copies occurr<strong>in</strong>g ma<strong>in</strong>ly<br />
amongst <strong>the</strong> Methanomicrobiales and Methanosarc<strong>in</strong>ales.<br />
MNT/HEPN gene pairs occur much more frequently but are<br />
irregularly distributed. They are most common amongst<br />
crenarchaeal <strong>the</strong>rmoacidophiles and <strong>the</strong>rmoneutrophiles and <strong>the</strong><br />
euryarchaeal hyper<strong>the</strong>rmophiles (Table 1).<br />
7.5. TA loci frequency and <strong>the</strong>ir relationship to genome size and<br />
environmental factors<br />
Generally, <strong>the</strong>re is no simple correlation between genome<br />
size and TA locus frequency for <strong>the</strong> different archaeal phyla.<br />
For example, for most members of <strong>the</strong> Sulfolobales <strong>the</strong><br />
estimated number of TA loci varies from 17 to 49 but <strong>the</strong><br />
m<strong>in</strong>imal genome of Acidianus hospitalis (2.2 Mb) carries 38<br />
while <strong>the</strong> largest genome of Sulfolobus solfataricus P2 (3 Mb)<br />
conta<strong>in</strong>s 33. For o<strong>the</strong>r phyla, a clearer picture emerges when<br />
compar<strong>in</strong>g stra<strong>in</strong>s with<strong>in</strong> <strong>the</strong> same genus e.g. <strong>the</strong> seven<br />
Thermococcus stra<strong>in</strong>s which have genome sizes rang<strong>in</strong>g from<br />
1.8 to 2.1 Mb. When ordered accord<strong>in</strong>g to <strong>in</strong>creas<strong>in</strong>g <br />
approximate size (Table 1), <strong>the</strong>se genomes carry 4, 10, 24, 25, 26,<br />
47 and 58 TA loci respectively, show<strong>in</strong>g that <strong>the</strong> TA frequency<br />
<strong>in</strong>creases disproportionately with genome size. A similar<br />
pattern is seen with Pyrobaculum stra<strong>in</strong>s. These results also<br />
underl<strong>in</strong>e <strong>the</strong> often large differences <strong>in</strong> <strong>the</strong> TA contents of<br />
pairs of closely related organisms.<br />
There is little correlation between TA loci numbers and optimum<br />
growth temperatures. Although Hyper<strong>the</strong>rmus butylicus which can<br />
grow up to 108 o C has a relatively high TA locus content of 18 (ma<strong>in</strong>ly<br />
vapBC loci) for a member of <strong>the</strong> Thermoproteales, Methanopyrus<br />
kandleri grow<strong>in</strong>g up to 110 o C has no detectable TA loci and some of<br />
<strong>the</strong> hyper<strong>the</strong>rmophilic Methanocaldococcus stra<strong>in</strong>s also exhibit few TA<br />
loci.<br />
More difficult to assess is <strong>the</strong> impact of <strong>the</strong> natural environments<br />
and <strong>the</strong> available nutrients, although <strong>in</strong> this respect <strong>the</strong> S. islandicus<br />
stra<strong>in</strong>s may be <strong>in</strong>formative (Reno et al., 2009; Guo et al., 2011). They<br />
were all isolated from terrestial acidic hot spr<strong>in</strong>gs with similar<br />
maximum growth temperatures and pH ranges but widely<br />
5
separated, and isolated, geographically; on Iceland, <strong>in</strong> Kamchatka,<br />
Russia and <strong>in</strong> Yellowstone and Lassen National Parks, USA while <strong>the</strong><br />
related S. solfataricus P2 stra<strong>in</strong> derives from Naples, Italy. Each of <strong>the</strong><br />
stra<strong>in</strong>s carry 26 to 36 TA loci which suggests that <strong>the</strong> nature of <strong>the</strong><br />
environment is important. Moreover, active terrestial hot spr<strong>in</strong>gs are<br />
likely to be particularly challeng<strong>in</strong>g for cells because temperatures<br />
can cont<strong>in</strong>uously change from maxima of around 80 o C to 0 o C, if<br />
surrounded by ice, and pH values and nutrient availability can also<br />
change rapidly. A def<strong>in</strong>itive answer to <strong>the</strong> effect of environmental<br />
factors on TA activity would require detailed and time consum<strong>in</strong>g<br />
experimental analyses of archaea cultivated under a wide range of<br />
conditions.<br />
7.6. Orphan tox<strong>in</strong> and antitox<strong>in</strong> genes<br />
Many orphan tox<strong>in</strong> and some orphan antitox<strong>in</strong> genes were<br />
detected <strong>in</strong> <strong>the</strong> genomes and <strong>the</strong> numbers tend to be proportional to<br />
<strong>the</strong> numbers of type II TA loci. For example, <strong>the</strong>re are many orphan<br />
tox<strong>in</strong> genes amongst <strong>the</strong> Sulfolobales. Some of <strong>the</strong>se may have been<br />
classed as orphans because <strong>the</strong> adjacent antitox<strong>in</strong> prote<strong>in</strong> gene was<br />
not identified (Pandey and Gerdes, 2005) and o<strong>the</strong>rs may be located<br />
adjacent to unidentified type III RNA antitox<strong>in</strong> genes (see Chapter<br />
14).<br />
Presumably, over time, antitox<strong>in</strong>s or tox<strong>in</strong>s may become<br />
associated with o<strong>the</strong>r cellular functions by selection. One such<br />
example could be provided by a s<strong>in</strong>gle vapC-like gene (Ahos0712) of<br />
A. hospitalis. It lies <strong>in</strong> an operon with genes encod<strong>in</strong>g prote<strong>in</strong>s<br />
<strong>in</strong>volved <strong>in</strong> transcription and <strong>in</strong>itiator tRNA b<strong>in</strong>d<strong>in</strong>g to <strong>the</strong> ribosome<br />
(You et al., 2011). This gene cassette is highly conserved <strong>in</strong> gene<br />
content, gene synteny and sequence <strong>in</strong> o<strong>the</strong>r Sulfolobus genomes<br />
(Guo et al. 2011). A possible explanation is that this orphan VapC-like<br />
prote<strong>in</strong> acts as a VapC competitor and may regulate or <strong>in</strong>hibit<br />
<strong>in</strong>itiator tRNA cleavage.<br />
7.7. Locations with<strong>in</strong> genomes<br />
Earlier comparative genomic analyses of closely related Sulfolobus<br />
species <strong>in</strong>dicated that TA gene pairs tend to be concentrated <strong>in</strong><br />
relatively large genomic regions (0.7 to 1 Mbp). These regions are <strong>the</strong><br />
most variable <strong>in</strong> gene synteny and gene content (Guo et al., 2011)<br />
consistent with <strong>the</strong> extensive exchange of genes hav<strong>in</strong>g occurred<br />
<strong>in</strong>tra- and/or <strong>in</strong>ter-genomically. This is illustrated for <strong>the</strong> genomes of<br />
S. islandicus REY15A and <strong>the</strong> related S. solfataricus P2 where a high<br />
level of gene synteny is ma<strong>in</strong>ta<strong>in</strong>ed throughout about two thirds of<br />
<strong>the</strong> genome while <strong>the</strong> rema<strong>in</strong><strong>in</strong>g one third is extensively shuffled<br />
6
Figure 7.1. Comparison of genomes from pairs of closely related archaea. Dot<br />
plots of (A) Sulfolobus species S. islandicus REY15A and S. solfataricus P2, and (B)<br />
Thermococcus species T. onnur<strong>in</strong>eus and T. kodakarensis show<strong>in</strong>g regions of gene<br />
synteny (red) and <strong>in</strong>verted synteny (blue). The total genome sizes are given and<br />
<strong>the</strong> large variable regions <strong>in</strong> each genome are shaded. TA loci are denoted by black<br />
l<strong>in</strong>es along <strong>the</strong> correspond<strong>in</strong>g genome axes.<br />
Figure 7.2. Phylogenetic trees for VapB antitox<strong>in</strong>s and VapC tox<strong>in</strong>s of <strong>the</strong><br />
acido<strong>the</strong>rmophile A. hospitalis W1. (A) The VapB tree demonstrates that <strong>the</strong><br />
highly diverse antitox<strong>in</strong>s can be classified <strong>in</strong>to three ma<strong>in</strong> subfamilies AbrB,<br />
CcdA/CopG and DUF217. In (B) <strong>the</strong> VapC tree shows <strong>the</strong> highly diverse tox<strong>in</strong><br />
sequences fall<strong>in</strong>g <strong>in</strong>to one major group<strong>in</strong>g. The VapB subfamily l<strong>in</strong>ked to each<br />
VapC is given. Moreover <strong>the</strong> number of closely similar VapC prote<strong>in</strong>s present <strong>in</strong><br />
<strong>the</strong> available 13 Sulfolobales genomes (Table 1) is listed - 0 <strong>in</strong>dicates that no VapC<br />
with a similar sequence is encoded <strong>in</strong> <strong>the</strong> genomes, while 13 <strong>in</strong>dicates that a VapC<br />
with a closely similar sequence is encoded <strong>in</strong> each <strong>the</strong> genomes. Ahos Genbank<br />
numbers are given for each prote<strong>in</strong>. Modified from You et al., (2011).<br />
7
(Figure 1A). Most of <strong>the</strong> TA loci of both species fall with<strong>in</strong> <strong>the</strong><br />
variable region. Although few pairs of genome sequences from<br />
closely related archaeal species are available which show extensive<br />
gene synteny, a comparable analysis was possible for <strong>the</strong> genomes of<br />
two Thermococcus species. T. kodakarensis carry<strong>in</strong>g many TA loci and<br />
T. onnur<strong>in</strong>eus that exhibits very few TA loci (Figure 1B). Here <strong>the</strong><br />
gene synteny is more limited and extends only over about one half of<br />
<strong>the</strong> genome but aga<strong>in</strong> <strong>the</strong> TA loci of T. kodakarensis are concentrated<br />
<strong>in</strong> <strong>the</strong> shuffled genome region. The latter example also illustrates <strong>the</strong><br />
stark differences <strong>in</strong> <strong>the</strong> numbers of TA loci between some fairly<br />
closely related species.<br />
Although several genomes, <strong>in</strong>clud<strong>in</strong>g some Sulfolobus species,<br />
conta<strong>in</strong> many transposable elements and TA loci <strong>the</strong>re is no general<br />
proportionality between <strong>the</strong> two. For example, both Thermococcus<br />
genomes carry few IS elements but one of <strong>the</strong> species, T. kadakarensis,<br />
exhibts several TA loci (Figure 1B). Moreover, several of <strong>the</strong> genomes<br />
carry many transposable elements but few TA loci (e.g. Pyrococcus<br />
furiosus, Halobacterium NRC1 and Thermoplasma volcanium) while<br />
o<strong>the</strong>rs exhibit few transposable elements but conta<strong>in</strong> many TA loci<br />
(e.g. Sulfolobus acidocaldarius, H. butylicus and Thermococcus sp. AM4)<br />
(Brügger et al., 2002; Filée et al., 2007).<br />
7.8. TA sequence diversity with<strong>in</strong> genomes<br />
The A. hospitalis genome carries 24 vapBC loci concentrated with<strong>in</strong><br />
<strong>the</strong> genomic regions 350-410 kb and 1,374-1,912 kb (You et al., 2011).<br />
Whereas <strong>the</strong> VapC tox<strong>in</strong>s are all PIN doma<strong>in</strong> prote<strong>in</strong>s (PilT Nterm<strong>in</strong>al<br />
doma<strong>in</strong>), <strong>the</strong> VapB antitox<strong>in</strong>s were classified <strong>in</strong>to three<br />
families of transcriptional regulators, AbrB, CcdA/CopG and<br />
DUF217 (Figure 2A) (You et al., 2011). Tree build<strong>in</strong>g based on<br />
sequence alignments demonstrated that <strong>the</strong> sequences of <strong>the</strong>se<br />
antitox<strong>in</strong>s and tox<strong>in</strong>s are all highly diverse, with sequence identities<br />
between <strong>the</strong>m rarely exceed<strong>in</strong>g 30%, as <strong>in</strong>dicated by all <strong>the</strong> long tree<br />
branches for each prote<strong>in</strong> (Figure 2). A parallel tree build<strong>in</strong>g study of<br />
<strong>the</strong> closely related S. islandicus stra<strong>in</strong>s REY15A and HVE10/4 carry<strong>in</strong>g<br />
18 and 19 vapBC gene pairs, respectively, yielded a similar pattern of<br />
long branches for each VapB and VapC prote<strong>in</strong> (Guo et al., 2011).<br />
Thus all antitox<strong>in</strong>s and tox<strong>in</strong>s with<strong>in</strong> each archaeon are highly<br />
diverse <strong>in</strong> sequence.<br />
In contrast, when <strong>in</strong>tergenomic comparisons were made for o<strong>the</strong>r<br />
members of <strong>the</strong> Sulfolobales, isolated from both closely and distantly<br />
separated geographical terrestial hot spr<strong>in</strong>gs, several VapBC<br />
complexes showed high sequence similarity. For example, 11 of <strong>the</strong><br />
24 VapBC prote<strong>in</strong> pairs identified <strong>in</strong> A. hospitalis (Figure 2), exhibit<br />
8
closely similar sequences to homologs encoded <strong>in</strong> at least seven of<br />
<strong>the</strong> 13 available Sulfolobus genomes (You et al., 2011). A fur<strong>the</strong>r<br />
example is illustrated for <strong>the</strong> VapC tox<strong>in</strong>s of Pyrococcus species<br />
(Figure 3A) and for <strong>the</strong> predicted MNT tox<strong>in</strong> of <strong>the</strong> MNT/HEPN<br />
gene pairs for Pyrobaculum species (Figure 3B). The result shows that<br />
<strong>the</strong> VapC and MNT sequences with<strong>in</strong> each cluster of short branches<br />
derive from different species. Thus, <strong>the</strong>re is apparently selection<br />
aga<strong>in</strong>st <strong>the</strong> uptake of closely similar vapBC loci or MNT/HEPN gene<br />
pairs <strong>in</strong> a given genome, despite <strong>the</strong> abundance of many similar gene<br />
pairs <strong>in</strong> <strong>the</strong> environment.<br />
The tree-build<strong>in</strong>g results of <strong>the</strong> analysis demonstrated fur<strong>the</strong>r that<br />
for given gene pairs <strong>the</strong> subtypes of VapB and VapC do not always<br />
correspond imply<strong>in</strong>g that gene pairs exchange partners <strong>the</strong>reby<br />
potentially creat<strong>in</strong>g <strong>in</strong>creased functional diversity of <strong>the</strong> TA <strong>system</strong>s<br />
(You et al., 2011), consistent with an earlier hypo<strong>the</strong>sis (Gerdes et al.,<br />
2005).<br />
Figure 7.3 Phylogenetic trees for <strong>the</strong> tox<strong>in</strong> VapC and <strong>the</strong> predicted tox<strong>in</strong> MNT.<br />
(A) VapC prote<strong>in</strong>s encoded <strong>in</strong> different <strong>in</strong> Pyrococcus species, and (B) MNT<br />
prote<strong>in</strong>s encoded <strong>in</strong> diverse Pyrobaculum species. Prote<strong>in</strong>s that fall with<strong>in</strong> <strong>the</strong> small<br />
clusters of short branches derive from different organisms. Trees generated for<br />
prote<strong>in</strong>s deriv<strong>in</strong>g exclusively from one organism yield long branches. Gene<br />
numbers are given for each of <strong>the</strong> genomes analysed (see Table 1).<br />
9
7.9. Stress response<br />
Antitox<strong>in</strong>-tox<strong>in</strong>s were orig<strong>in</strong>ally shown to enhance plasmid<br />
ma<strong>in</strong>tenance as a consequence of <strong>the</strong> growth of plasmid-free cells<br />
be<strong>in</strong>g preferentially <strong>in</strong>hibited, post segregation, by free tox<strong>in</strong>s that<br />
are <strong>in</strong>herently more stable than antitox<strong>in</strong>s (Gerdes et al. 2005). To<br />
date, relatively few archaeal plasmids have been sequenced and<br />
<strong>the</strong>re is no current evidence for type II TA loci occurr<strong>in</strong>g widely <strong>in</strong><br />
plasmids. Never<strong>the</strong>less, <strong>the</strong> plasmid ma<strong>in</strong>tenance mechanism led to<br />
<strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> TA <strong>system</strong>s encoded widely <strong>in</strong> chromosomes<br />
facilitate retention of local DNA regions carry<strong>in</strong>g important genes<br />
that might o<strong>the</strong>rwise be prone to loss (Magnuson, 2007; Van<br />
Melderen 2010).<br />
This hypo<strong>the</strong>sis receives support from <strong>the</strong> observation that vapBC<br />
loci and <strong>the</strong> HEPN/MNT gene pairs are concentrated with<strong>in</strong> variable<br />
genomic regions of members of <strong>the</strong> Sulfolobales and Thermococcales<br />
where <strong>in</strong>tergenomic DNA exchange appears to be most active<br />
(Figure 1). Fur<strong>the</strong>rmore, <strong>the</strong> hypo<strong>the</strong>sis is re<strong>in</strong>forced by <strong>the</strong> high<br />
sequence diversity of each of <strong>the</strong> numerous VapC prote<strong>in</strong>s encoded<br />
with<strong>in</strong> <strong>the</strong>se genomes, exemplified for A. hospitalis (Figure 2). For any<br />
pair of similar VapBC complexes, <strong>the</strong> loss of one would be<br />
compensated for by <strong>the</strong> presence of <strong>the</strong> o<strong>the</strong>r, <strong>the</strong>reby underm<strong>in</strong><strong>in</strong>g<br />
any DNA ma<strong>in</strong>tenance capability.<br />
For bacteria which grow slowly <strong>in</strong> nutrient poor environments,<br />
multiple tox<strong>in</strong>s are strongly implicated <strong>in</strong> respond<strong>in</strong>g to different<br />
types of nutrient deficiency and/or <strong>in</strong> enhanc<strong>in</strong>g quality control<br />
(Gerdes, 2000; Pandey and Gerdes, 2005). Involvement <strong>in</strong> stress<br />
response entails that <strong>the</strong> tox<strong>in</strong>s <strong>in</strong>hibit growth, allow<strong>in</strong>g <strong>the</strong> host to<br />
lie <strong>in</strong> a dormant state dur<strong>in</strong>g <strong>the</strong> period of environmental stress<br />
(Pedersen et al., 2002; Gerdes et al. 2005). In this context, tox<strong>in</strong>s have<br />
also been implicated <strong>in</strong> produc<strong>in</strong>g persistent cells which are able to<br />
rema<strong>in</strong> dormant for longer periods and to withstand prolonged<br />
exposure to stress factors <strong>in</strong>clud<strong>in</strong>g antibiotics (Maisonneuve et al.,<br />
2011).<br />
There may well be a negative effect on host growth as a<br />
consequence of carry<strong>in</strong>g large numbers of TA loci (30 to 40 TA loci<br />
for some Sulfolobus species and a few o<strong>the</strong>r archaea (Table 1))<br />
because of <strong>the</strong> likelihood of <strong>the</strong> cont<strong>in</strong>uous presence of low levels of<br />
free tox<strong>in</strong> (Wilbur et al. 2005). Although only highly diverse vapBC<br />
loci are present, presumably <strong>in</strong> order to avoid redundancy, <strong>the</strong> total<br />
number of TA loci present per genome may reflect a compromise<br />
between <strong>the</strong> ability to ma<strong>in</strong>ta<strong>in</strong> important genes and to survive<br />
10
different environmental stresses while reta<strong>in</strong><strong>in</strong>g an adequate growth<br />
rate under normal conditions.<br />
In conclusion, <strong>the</strong>re is a major deficit <strong>in</strong> experimental work on<br />
archaeal TA <strong>system</strong>s, especially with regard to stress responses.<br />
Almost all research to date has focussed on bacteria. One exception<br />
was <strong>the</strong> demonstration that <strong>the</strong> mode of action of a bacterial RelE<br />
tox<strong>in</strong> <strong>in</strong> M. jannaschii and bacteria were similar <strong>in</strong> vitro (Christensen<br />
and Gerdes, 2003). Moreover, heat shock of S. solfataricus (from 80 o C<br />
to 90 o C) was shown to <strong>in</strong>duce expression of some TA loci while<br />
knockout of a s<strong>in</strong>gle vapBC locus <strong>in</strong>creased heat shock lability<br />
(Cooper et al., 2009). Clearly, however, many challeng<strong>in</strong>g<br />
experiments rema<strong>in</strong> to be performed <strong>in</strong> this rapidly develop<strong>in</strong>g field.<br />
7.10. Type II TA <strong>system</strong>s and viral defence<br />
It has been proposed that bacterial TA <strong>system</strong>s could be <strong>in</strong>volved<br />
<strong>in</strong> combat<strong>in</strong>g bacteriophage <strong>in</strong>fection by, for example, block<strong>in</strong>g<br />
ribosomes and prevent<strong>in</strong>g <strong>the</strong> viruses from dom<strong>in</strong>at<strong>in</strong>g <strong>the</strong><br />
translational apparatus, prior to <strong>the</strong>ir propagat<strong>in</strong>g and lys<strong>in</strong>g cells<br />
(see Chapter 5). The <strong>in</strong>ferred result would be that only <strong>the</strong> phage<strong>in</strong>fected<br />
cells would die. In pr<strong>in</strong>ciple, archaeal TA <strong>system</strong>s which<br />
primarily target translation could act similarly. However most<br />
archaeal viruses, and especially those from extremely <strong>the</strong>rmophilic<br />
and halophilic environments, show morphotypes and genomic<br />
properties dist<strong>in</strong>ct from bacterial and eukaryal viruses and <strong>the</strong>y<br />
generally exist <strong>in</strong> stable relationships with <strong>the</strong>ir hosts at low copy<br />
numbers, <strong>in</strong>frequently, if ever, caus<strong>in</strong>g cell lysis (Prangishvili et al.,<br />
2006; Porter et al., 2007). Consistent with <strong>the</strong>se properties,<br />
circumstantial evidence suggests that <strong>the</strong> level of free viruses, at least<br />
<strong>in</strong> extreme <strong>the</strong>rmoacidophilic environments, tend to be low relative<br />
to cellular levels suggest<strong>in</strong>g that <strong>the</strong>se viruses prefer to rema<strong>in</strong><br />
with<strong>in</strong> cells under <strong>the</strong>se challeng<strong>in</strong>g conditions (Snyder et al., 2010).<br />
Ano<strong>the</strong>r <strong>in</strong>trigu<strong>in</strong>g possibility arises from juxtaposition<strong>in</strong>g of TA<br />
loci and <strong>CRISPR</strong> loci (Clustered Regularly Interspaced Short<br />
Pal<strong>in</strong>dromic Repeats) <strong>in</strong> some archaea. <strong>CRISPR</strong>-based adaptive<br />
<strong>immune</strong> <strong>system</strong>s target <strong>in</strong>vad<strong>in</strong>g genetic elements, primarily viruses<br />
and conjugative plasmids, and <strong>the</strong>y have been classified <strong>in</strong>to three<br />
major types, of which only two (types I and III) occur <strong>in</strong> archaea,<br />
often with both major types present <strong>in</strong> <strong>the</strong> same archaeon (Garrett et<br />
al., 2011). The <strong>CRISPR</strong> arrays carry spacer regions taken up from<br />
<strong>in</strong>vad<strong>in</strong>g genetic elements and <strong>the</strong>ir processed transcripts are able to<br />
facilitate target<strong>in</strong>g and cleavage of genetic elements with match<strong>in</strong>g<br />
sequences. An example of a complex assembly of a type III <strong>CRISPR</strong>based<br />
<strong>system</strong>, present <strong>in</strong> <strong>the</strong> A. hospitalis genome, is shown <strong>in</strong> Figure<br />
11
4. The <strong>CRISPR</strong> arrays and associated gene cassettes are <strong>in</strong>terwoven<br />
with four vapBC loci for which all <strong>the</strong> antitox<strong>in</strong>s and tox<strong>in</strong>s carry<br />
highly divergent sequences (Figure 2). Thus, <strong>the</strong>se <strong>CRISPR</strong>associated<br />
TA <strong>system</strong>s could play a secondary role <strong>in</strong> combat<strong>in</strong>g<br />
<strong>in</strong>vad<strong>in</strong>g genetic elements by help<strong>in</strong>g to ma<strong>in</strong>ta<strong>in</strong> <strong>the</strong> functional<br />
<strong>CRISPR</strong> <strong>immune</strong> <strong>system</strong>s, which also tend to be located with<strong>in</strong> <strong>the</strong><br />
variable chromosomal regions. Ano<strong>the</strong>r <strong>in</strong>terest<strong>in</strong>g aspect of this<br />
<strong>system</strong> is that one vapBC locus associated with <strong>the</strong> type III<br />
<strong>in</strong>terference <strong>system</strong> <strong>in</strong> A. hospitalis (Figure 4B) shows a high level of<br />
sequence identity with vapBC loci specifically associated with a<br />
different subclass of type III <strong>in</strong>terference <strong>system</strong>s found <strong>in</strong> <strong>the</strong> S.<br />
islandicus stra<strong>in</strong>s REY15A and HVE10/4 (Figure 4C) (Guo et al., 2011)<br />
suggest<strong>in</strong>g that <strong>in</strong>dividual types of TA loci may coevolve with genes<br />
exhibit<strong>in</strong>g specific functions.<br />
Figure 7.4 Type III <strong>CRISPR</strong> <strong>system</strong>s l<strong>in</strong>ked to vapBC gene pairs. (A) <strong>CRISPR</strong> loci<br />
and genes of <strong>the</strong> acido<strong>the</strong>rmophile A. hospitalis W1. <strong>CRISPR</strong> loci (black) show <strong>the</strong><br />
numbers of repeats present. Genes encode prote<strong>in</strong>s <strong>in</strong>volved <strong>in</strong> uptake of new<br />
spacers (adaptation) labelled aCas, a gene encod<strong>in</strong>g <strong>the</strong> RNA process<strong>in</strong>g enzyme<br />
Cas6, and a gene cassette encod<strong>in</strong>g type III <strong>in</strong>terference prote<strong>in</strong>s. Four vapBC gene<br />
pairs that are highly divergent <strong>in</strong> sequence are also present. (B) Expansion of <strong>the</strong><br />
type III <strong>in</strong>terference cassette of A. hospitalis, and (C) location of a highly similar<br />
vapBC gene pair located next to a different class of type III <strong>CRISPR</strong> <strong>in</strong>terference<br />
cassette (denoted Cmr) <strong>in</strong> S. islandicus HV E10/4. Numbers of repeats are <strong>in</strong>dicated<br />
for each <strong>CRISPR</strong> locus.<br />
7.11. Conclusions<br />
Clearly, <strong>the</strong>se are early days for studies of archaeal TA loci.<br />
Almost all of <strong>the</strong> experimental work to date has been performed on<br />
different bacterial TA <strong>system</strong>s some of which have no equivalent<br />
amongst archaea. Support is provided here for a role <strong>in</strong> ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />
important regions of chromsomal DNA for those organisms,<br />
particularly members of <strong>the</strong> Sulfolobales and Thermococcales, which<br />
exhibit large variable genomic regions and often carry many TA loci.<br />
Involvement <strong>in</strong> response to nutrient deficiency and o<strong>the</strong>r stress<br />
factors are highly probable and <strong>the</strong>se potential functional roles are<br />
not mutually exclusive. A rationale is provided for parts of <strong>the</strong><br />
highly conserved translational apparatus be<strong>in</strong>g <strong>the</strong> primary target<br />
for some tox<strong>in</strong>s that archaea share with bacteria. F<strong>in</strong>ally, it rema<strong>in</strong>s to<br />
12
e seen whe<strong>the</strong>r <strong>the</strong>re are undiscovered archaea-specific TA <strong>system</strong>s,<br />
or possibly hybrid <strong>system</strong>s with bacterial and archaeal antitox<strong>in</strong>tox<strong>in</strong><br />
components, which exclusively target archaeal cellular<br />
components.<br />
References<br />
Brügger,K., Redder,P., She,Q., Confalonieri,F., Zivanovic,Y. and<br />
Garrett,R.A. (2002) Mobile elements <strong>in</strong> archaeal genomes. FEMS<br />
Microbiol Letts 206: 131-141.<br />
Christensen,S.K., and Gerdes,K. (2003) RelE tox<strong>in</strong>s from bacteria and<br />
<strong>Archaea</strong> cleave mRNAs on translat<strong>in</strong>g ribosomes which are<br />
rescued by tmRNA. Mol Microbiol 48: 1389-1400.<br />
Cooper,C.R., Daugherty,A.J., Tachdjian,S., Blum,P.H., and Kelly,R.M.<br />
(2009) Role of vapBC tox<strong>in</strong>-antitox<strong>in</strong> loci <strong>in</strong> <strong>the</strong> <strong>the</strong>rmal stress<br />
response of Sulfolobus solfataricus. Biochem Soc Trans 37: 123-126.<br />
Dienemann,C., Bøggild,A., W<strong>in</strong><strong>the</strong>r,K.S. , Gerdes,K., and<br />
Brodersen,D. (2011) Crystal structure of VapBC tox<strong>in</strong>-antitox<strong>in</strong><br />
complex from Shigella flexneri reveals a hetero-octameric DNAb<strong>in</strong>d<strong>in</strong>g<br />
assembly. J. Mol Biol 414: 713-722.<br />
Eddy,S.R. (2011) Accelerated profile HMM searches. PLoS Comput<br />
Biol 7: 10.<br />
Filée,J., Siguier,P., and Chandler,M. (2007) Insertion sequence<br />
diversity <strong>in</strong> archaea. Microbiol Molec Biol Revs 71: 121-157.<br />
Gadelle,D., Filee,J., Buhler,C., and Forterre,P. (2003) Phylogenomics<br />
of type II DNA topoisomerases. Bioessays 25: 232-242.<br />
Garrett,R.A., Vestergaard,G., and Shah,S.A. (2011) <strong>Archaea</strong>l <strong>CRISPR</strong>based<br />
<strong>immune</strong> <strong>system</strong>s: exchangeable functional modules. Trends<br />
Microbiol 19: 549-556.<br />
Gerdes,K. (2000) Tox<strong>in</strong>-antitox<strong>in</strong> modules may regulate synthsis of<br />
macromolecules dur<strong>in</strong>g nutritional stress. J Bacteriol 182: 561-572.<br />
Gerdes,K., Christensen,S.K., and Lobner-Olesen,A. (2005)<br />
Prokaryotic tox<strong>in</strong>-antitox<strong>in</strong> stress response loci Nat Rev Microbiol 3:<br />
371-382.<br />
Gribaldo,S., Poole,A.M., Daub<strong>in</strong>,V., Forterre,P., and Brochier-<br />
Armanet,C. (2010) The orig<strong>in</strong> of eukaryotes and <strong>the</strong>ir<br />
relationship with <strong>the</strong> <strong>Archaea</strong>: are we at a phylogenomic<br />
impasse? Nat Rev Microbiol 8: 743-752.<br />
Guo,L., Brügger,K., Liu,C., Shah,S.A., Zheng,H., Zhu,Y., et al. (2011)<br />
Genome analyses of Icelandic stra<strong>in</strong>s of Sulfolobus islandicus:<br />
Model organisms for genetic and virus-host <strong>in</strong>teraction studies. J<br />
Bacteriol 193: 1672-1680.<br />
13
Jørgensen,M.G., Pandey,D.P., Jaskolska,M., and Gerdes,K. (2009)<br />
HicA of Escherichia coli def<strong>in</strong>es a novel family of translation<strong>in</strong>dependent<br />
mRNA transferases <strong>in</strong> bacteria and archaea. J<br />
Bacteriol 191: 1191-1199.<br />
Kletz<strong>in</strong>,A. (2007) General characteristics and important model<br />
organisms. In: <strong>Archaea</strong> Molecular and Cellular Biology (Ed. R.<br />
Cavicchioli) pp. 14-92. ASM press, Wash. DC, USA<br />
Kurland,C.G., Coll<strong>in</strong>s,L.J., and Penny,D. (2006) Genomics and <strong>the</strong><br />
irreducible nature of eukaryotic cells. Science 312: 1011-1014.<br />
Magnuson,R.D. (2007) Hypo<strong>the</strong>tical functions of tox<strong>in</strong>-antitox<strong>in</strong><br />
<strong>system</strong>s. J Bacteriol 189: 6089-6092.<br />
Maisonneuve, E., Shakespeare,L.J., Jørgensen,M.G., and Gerdes,K.<br />
(2011) Bacterial persistence by RNA endonucleases. Proc Natl Acad<br />
Sci USA 108: 13206-13211.<br />
Makarova,K.S., Grish<strong>in</strong>,N.V., and Koon<strong>in</strong>,E.V. (2009a) The HicAB<br />
cassette, a putative novel, RNA target<strong>in</strong>g tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong> <strong>in</strong><br />
archaea and bacteria. Bio<strong>in</strong>formatics 22: 2581-2584.<br />
Makarova,K.S., Wolf,Y.I., and Koon<strong>in</strong>,E.V. (2009b) Comprehensive<br />
comparative-genomic analysis of Type 2 tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s<br />
and related mobile stress response <strong>system</strong>s <strong>in</strong> prokaryotes. Biol<br />
Direct 4: 19.<br />
Melderen,L.V. (2010) Tox<strong>in</strong>-antitox<strong>in</strong> <strong>system</strong>s: why so many, what<br />
for? Curr Op<strong>in</strong> Microbiol 13: 781-785.<br />
Neubauer,C., Gao,Y.G., Andersen,K.R., Dunham,C.M., Kelley,A.C.,<br />
Hentschel,J., et al. (2009) The structural basis for mRNA<br />
recognition and cleavage by <strong>the</strong> ribosome-dependent<br />
endonuclease RelE. Cell 139: 1084-1095.<br />
Pandey,D.P., and Gerdes,K. (2005) Tox<strong>in</strong>-antitox<strong>in</strong> loci are highly<br />
abundant <strong>in</strong> free-liv<strong>in</strong>g but lost from host-associated prokaryotes.<br />
Nucleic Acids Res 33: 966-976.<br />
Pedersen,K., Christensen,S.K., and Gerdes,K. (2002) Rapid <strong>in</strong>duction<br />
and reversal of a bacteriostatic condition by controlled expression<br />
of tox<strong>in</strong>s and antitox<strong>in</strong>s. Mol Microbiol 45: 501-510.<br />
Porter,K., Russ,B.E., and Dyall-Smith,M.L. (2007) Virus-host<br />
<strong>in</strong>teractions <strong>in</strong> salt lakes. Curr Op<strong>in</strong> Microbiol 10: 418-424.<br />
Prangishvili,D., Forterre,P., and Garrett,R.A. (2006) Viruses of <strong>the</strong><br />
<strong>Archaea</strong>: a unify<strong>in</strong>g view. Nat Rev Microbiol 11: 837-848.<br />
Reno,M.L., Held,N.L., Fields,C.J., Burke,P.V., and Whitaker,R.J.<br />
(2009) Sulfolobus islandicus pan-genome. Proc Natl Acad Sci USA<br />
106: 8605-8610.<br />
Rodriguez-Fonseca,C., Amils,R., and Garrett,R.A. (1995) F<strong>in</strong>e<br />
structure of <strong>the</strong> peptidyl transferase centre on 23 S-like rRNAs<br />
14
deduced from chemical prob<strong>in</strong>g of antibiotic-ribosome complexes.<br />
J Molec Biol 247: 224-235.<br />
Snyder,J.C. Bateson M.M., Lav<strong>in</strong> M., and Young M.J. (2010) Use of<br />
cellular <strong>CRISPR</strong> (clusters of regularly <strong>in</strong>terspaced short<br />
pal<strong>in</strong>dromic repeats) spacer-based microarrays for detection of<br />
viruses <strong>in</strong> environmental samples. Appl Environ Microbiol 76:<br />
7251-7258.<br />
Valent<strong>in</strong>e,D.L. (2007) Adaptations to energy stress dictate <strong>the</strong><br />
ecology and evolution of archaea. Nat Rev Microbiol 5: 316-323.<br />
Wilbur,J.S., Chivers,P.T., Mattison,K., Potter,L., Brennan,R.G., and<br />
So,M. (2005) Neisseria gonorrheae FitA <strong>in</strong>teracts with FitB to b<strong>in</strong>d<br />
DNA through its ribbon-helix-helix motif. Biochem 44: 12515–<br />
12524.<br />
W<strong>in</strong><strong>the</strong>r,K.S., and Gerdes,K. (2011) Enteric virulence associated<br />
prote<strong>in</strong> VapC <strong>in</strong>hibits translation by cleavage of <strong>in</strong>itiator tRNA.<br />
Proc Natl Acad Sci USA 108: 7403-7407.<br />
Yamashiro,K., and Yamagishi,A. (2005) Characterization of <strong>the</strong> DNA<br />
gyrase from <strong>the</strong> <strong>the</strong>rmoacidophilic Archaeon Thermoplasma<br />
acidophilum. J Bacteriol 8531-8536.<br />
You,X-Y., Liu,C., Wang,S-Y., Jiang,C-Y., Shah,S.A., Prangishvili,D. et<br />
al. (2011) Genomic studies of Acidianus hospitalis W1 a host for<br />
study<strong>in</strong>g crenarchaeal virus and plasmid life cycles. Extremophiles<br />
15: 487-497.<br />
15
Table 1 Phylogenetic tree of archaea for which complete genome<br />
sequences are available toge<strong>the</strong>r with <strong>the</strong> estimated number of TA<br />
loci of <strong>the</strong> vapBC, relBE and hicAB families, and <strong>the</strong> numbers of<br />
MNT/HEPN gene pairs. In <strong>the</strong> k<strong>in</strong>gdom phyla column (P) C denotes<br />
Crenarchaeota, E - Euryarchaeota, T - Thaumarchaeota, K -<br />
Korarchaeota and N - Nanoarchaeota. In <strong>the</strong> Order column (O) S<br />
denotes Sulfolobales, D - Desulfurococcales, O - Acidolobales, P -<br />
Thermoproteales, Y - Methanopyrales, T - Thermococcales, A -<br />
Archaeoglobales, C - Methanococcales, B - Methanobacteriales, M -<br />
Methanomicrobiales, N - Methanosarc<strong>in</strong>ales, E - Methanocellales, H -<br />
Halobacteriales and L - Thermoplasmatales. The ecological niches of<br />
<strong>the</strong> different organisms are <strong>in</strong>dicated toge<strong>the</strong>r with <strong>the</strong>ir degree of<br />
<strong>the</strong>rmophilicity, with a border of optimal growth of 50 o C. The<br />
numbers of <strong>the</strong> different TA loci and MNT/HEPN gene pairs are<br />
colour-shaded extend<strong>in</strong>g from bright red (> 20), light red (20 to 11),<br />
p<strong>in</strong>k (10 to 6), violet (5 to 3) and light blue (2 to 1). Approximate<br />
genome sizes and <strong>the</strong> Genbank/EMBL accession numbers are given<br />
for <strong>the</strong> genomes.<br />
17
BIBLIOGRAPHY<br />
[1] A F Andersson and J F Banfield. Virus population dynamics<br />
and acquired virus resistance <strong>in</strong> natural microbial communities.<br />
Science, 320(5879):1047–1050, May 2008.<br />
[2] Kathryne S Auernik, Yukari Maezato, Paul H Blum,<br />
and Robert M Kelly. The genome sequence of <strong>the</strong><br />
metal-mobiliz<strong>in</strong>g, extremely <strong>the</strong>rmoacidophilic archaeon<br />
metallosphaera sedula provides <strong>in</strong>sights <strong>in</strong>to bioleach<strong>in</strong>gassociated<br />
metabolism. Appl Environ Microbiol, 74(3):682–92,<br />
Feb 2008.<br />
[3] R Barrangou, C Fremaux, H Deveau, M Richards, P Boyaval,<br />
S Mo<strong>in</strong>eau, D A Romero, and P Horvath. Crispr provides<br />
acquired resistance aga<strong>in</strong>st viruses <strong>in</strong> prokaryotes. Science,<br />
315(5819):1709–1712, Mar 2007.<br />
[4] Elizabeth R Barry and Stephen D Bell. Dna replication <strong>in</strong><br />
<strong>the</strong> archaea. Microbiol Mol Biol Rev, 70(4):876–87, Dec 2006.<br />
[5] C Bath and M L Dyall-Smith. His1, an archaeal virus of<br />
<strong>the</strong> fuselloviridae family that <strong>in</strong>fects haloarcula hispanica. J<br />
Virol, 72(11):9392–5, Nov 1998.<br />
[6] David L Bernick, Courtney L Cox, Patrick P Dennis, and<br />
Todd M Lowe. Comparative genomic and transcriptional<br />
analyses of crispr <strong>system</strong>s across <strong>the</strong> genus pyrobaculum.<br />
Front Microbiol, 3:251, 2012.<br />
[7] A Bolot<strong>in</strong>, B Qu<strong>in</strong>quis, A Sorok<strong>in</strong>, and S D Ehrlich. Clustered<br />
regularly <strong>in</strong>terspaced short pal<strong>in</strong>drome repeats (crisprs)<br />
have spacers of extrachromosomal orig<strong>in</strong>. Microbiology,<br />
151(Pt 8):2551–2561, Aug 2005.<br />
[8] Cel<strong>in</strong>e Brochier, Simonetta Gribaldo, Yvan Zivanovic, Fabrice<br />
Confalonieri, and Patrick Forterre. Nanoarchaea: representatives<br />
of a novel archaeal phylum or a fast-evolv<strong>in</strong>g<br />
euryarchaeal l<strong>in</strong>eage related to <strong>the</strong>rmococcales? Genome<br />
Biol, 6(5):R42, 2005.<br />
[9] C Brochier-Armanet, B Boussau, S Gribaldo, and P Forterre.<br />
Mesophilic crenarchaeota: proposal for a third archaeal<br />
195
196 Bibliography<br />
phylum, <strong>the</strong> thaumarchaeota. Nat Rev Microbiol, 6(3):245–<br />
252, Mar 2008.<br />
[10] S J Brouns, M M Jore, M Lundgren, E R Westra, R J Slijkhuis,<br />
A P Snijders, M J Dickman, K S Makarova, E V Koon<strong>in</strong>, and<br />
J van der Oost. Small crispr rnas guide antiviral defense <strong>in</strong><br />
prokaryotes. Science, 321(5891):960–964, Aug 2008.<br />
[11] Kimberly K Busiek and William Margol<strong>in</strong>. Split decision:<br />
a thaumarchaeon encod<strong>in</strong>g both ftsz and cdv cell division<br />
prote<strong>in</strong>s chooses cdv for cytok<strong>in</strong>esis. Mol Microbiol, 82(3):535–<br />
8, Nov 2011.<br />
[12] J Carte, R Wang, H Li, R M Terns, and M P Terns. Cas6 is<br />
an endoribonuclease that generates guide rnas for <strong>in</strong>vader<br />
defense <strong>in</strong> prokaryotes. Genes Dev, 22(24):3489–3496, Dec<br />
2008.<br />
[13] L Chen, K Brügger, M Skovgaard, P Redder, Q She, E Torar<strong>in</strong>sson,<br />
B Greve, M Awayez, A Zibat, H P Klenk, and R A<br />
Garrett. The genome of sulfolobus acidocaldarius, a model<br />
organism of <strong>the</strong> crenarchaeota. J Bacteriol, 187(14):4992–4999,<br />
Jul 2005.<br />
[14] Benoît Dayrat. The roots of phylogeny: how did haeckel<br />
build his trees? Syst Biol, 52(4):515–27, Aug 2003.<br />
[15] L<strong>in</strong>g Deng, Haojun Zhu, Zhengjun Chen, Yun Xiang Liang,<br />
and Qunx<strong>in</strong> She. Unmarked gene deletion and host-vector<br />
<strong>system</strong> for <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon sulfolobus<br />
islandicus. Extremophiles, 13(4):735–46, Jul 2009.<br />
[16] James G Elk<strong>in</strong>s, Mircea Podar, David E Graham, Kira S<br />
Makarova, Yuri Wolf, Lennart Randau, Brian P Hedlund,<br />
Cél<strong>in</strong>e Brochier-Armanet, Victor Kun<strong>in</strong>, Ia<strong>in</strong> Anderson, Alla<br />
Lapidus, Eugene Goltsman, Kerrie Barry, Eugene V Koon<strong>in</strong>,<br />
Phil Hugenholtz, Nikos Kyrpides, Gerhard Wanner, Paul<br />
Richardson, Mart<strong>in</strong> Keller, and Karl O Stetter. A korarchaeal<br />
genome reveals <strong>in</strong>sights <strong>in</strong>to <strong>the</strong> evolution of <strong>the</strong> archaea.<br />
Proc Natl Acad Sci U S A, 105(23):8102–7, Jun 2008.<br />
[17] Susanne Erdmann and Roger A Garrett. Selective and hyperactive<br />
uptake of foreign dna by adaptive <strong>immune</strong> <strong>system</strong>s<br />
of an archaeon via two dist<strong>in</strong>ct mechanisms. Mol Microbiol,<br />
Jul 2012.
Bibliography 197<br />
[18] Thijs J G Ettema, Ann-Christ<strong>in</strong> L<strong>in</strong>dås, and Rolf Bernander.<br />
An act<strong>in</strong>-based cytoskeleton <strong>in</strong> archaea. Mol Microbiol,<br />
80(4):1052–61, May 2011.<br />
[19] Roger A Garrett, David Prangishvili, Shiraz A Shah, Monika<br />
Reuter, Karl O Stetter, and Xu Peng. Metagenomic analyses<br />
of novel viruses and plasmids from a cultured environmental<br />
sample of hyper<strong>the</strong>rmophilic neutrophiles. Environ<br />
Microbiol, 12(11):2918–30, Nov 2010.<br />
[20] Roger A Garrett, Shiraz A Shah, Gisle Vestergaard, L<strong>in</strong>g<br />
Deng, Soley Gudbergsdottir, Chandra S Kenchappa, Susanne<br />
Erdmann, and Qunx<strong>in</strong> She. Crispr-based <strong>immune</strong> <strong>system</strong>s<br />
of <strong>the</strong> sulfolobales: complexity and diversity. Biochem Soc<br />
Trans, 39(1):51–7, Jan 2011.<br />
[21] Roger A Garrett, Gisle Vestergaard, and Shiraz A Shah.<br />
<strong>Archaea</strong>l crispr-based <strong>immune</strong> <strong>system</strong>s: exchangeable functional<br />
modules. Trends Microbiol, 19(11):549–56, Nov 2011.<br />
[22] Aurore Gorlas, Eugene V Koon<strong>in</strong>, Nadège Bienvenu, Daniel<br />
Prieur, and Claire Gesl<strong>in</strong>. Tpv1, <strong>the</strong> first virus isolated<br />
from <strong>the</strong> hyper<strong>the</strong>rmophilic genus <strong>the</strong>rmococcus. Environ<br />
Microbiol, 14(2):503–16, Feb 2012.<br />
[23] I Grissa, G Vergnaud, and C Pourcel. The crisprdb database<br />
and tools to display crisprs and to generate dictionaries of<br />
spacers and repeats. BMC Bio<strong>in</strong>formatics, 8(1):172–172, May<br />
2007.<br />
[24] Soley Gudbergsdottir, L<strong>in</strong>g Deng, Zhengjun Chen, Jaide<br />
V K Jensen, L<strong>in</strong>da R Jensen, Qunx<strong>in</strong> She, and Roger A<br />
Garrett. Dynamic properties of <strong>the</strong> sulfolobus crispr/cas<br />
and crispr/cmr <strong>system</strong>s when challenged with vector-borne<br />
viral and plasmid genes and protospacers. Mol Microbiol,<br />
79(1):35–49, Jan 2011.<br />
[25] Li Guo, Kim Brügger, Chao Liu, Shiraz A Shah, Huajun<br />
Zheng, Yongqiang Zhu, Shengyue Wang, Reidun K Lillestøl,<br />
Lanm<strong>in</strong>g Chen, Jeremy Frank, David Prangishvili, Lars<br />
Paul<strong>in</strong>, Qunx<strong>in</strong> She, Li Huang, and Roger A Garrett. Genome<br />
analyses of icelandic stra<strong>in</strong>s of sulfolobus islandicus,<br />
model organisms for genetic and virus-host <strong>in</strong>teraction studies.<br />
J Bacteriol, 193(7):1672–80, Apr 2011.
198 Bibliography<br />
[26] D H Haft, J Selengut, E F Mongod<strong>in</strong>, and K E Nelson. A<br />
guild of 45 crispr-associated (cas) prote<strong>in</strong> families and multiple<br />
crispr/cas subtypes exist <strong>in</strong> prokaryotic genomes. PLoS<br />
Comput Biol, 1(6), Nov 2005.<br />
[27] Caryn R Hale, Peng Zhao, Sara Olson, Michael O Duff,<br />
Brenton R Graveley, Lance Wells, Rebecca M Terns, and<br />
Michael P Terns. Rna-guided rna cleavage by a crispr rnacas<br />
prote<strong>in</strong> complex. Cell, 139(5):945–56, Nov 2009.<br />
[28] M Här<strong>in</strong>g, R Rachel, X Peng, R A Garrett, and D Prangishvili.<br />
Viral diversity <strong>in</strong> hot spr<strong>in</strong>gs of pozzuoli, italy, and characterization<br />
of a unique archaeal virus, acidianus bottleshaped<br />
virus, from a new family, <strong>the</strong> ampullaviridae. J Virol,<br />
79(15):9904–9911, Aug 2005.<br />
[29] P Horvath, A C Coûté-Monvois<strong>in</strong>, D A Romero, P Boyaval,<br />
C Fremaux, and R Barrangou. Comparative analysis of crispr<br />
loci <strong>in</strong> lactic acid bacteria genomes. Int J Food Microbiol, Jul<br />
2008.<br />
[30] Y Ish<strong>in</strong>o, H Sh<strong>in</strong>agawa, K Mak<strong>in</strong>o, M Amemura, and A Nakata.<br />
Nucleotide sequence of <strong>the</strong> iap gene, responsible<br />
for alkal<strong>in</strong>e phosphatase isozyme conversion <strong>in</strong> escherichia<br />
coli, and identification of <strong>the</strong> gene product. J Bacteriol,<br />
169(12):5429–33, Dec 1987.<br />
[31] R Jansen, J D Embden, W Gaastra, and L M Schouls. Identification<br />
of genes that are associated with dna repeats <strong>in</strong><br />
prokaryotes. Mol Microbiol, 43(6):1565–1575, Mar 2002.<br />
[32] Matthijs M Jore, Magnus Lundgren, Es<strong>the</strong>r van Duijn, Jelle B<br />
Bultema, Edze R Westra, Sakharam P Waghmare, Blake<br />
Wiedenheft, Umit Pul, Re<strong>in</strong>hild Wurm, Rolf Wagner, Marieke<br />
R Beijer, Arjan Barendregt, Kaihong Zhou, Ambrosius<br />
P L Snijders, Mark J Dickman, Jennifer A Doudna, Egbert J<br />
Boekema, Albert J R Heck, John van der Oost, and Stan J J<br />
Brouns. Structural basis for crispr rna-guided dna recognition<br />
by cascade. Nat Struct Mol Biol, 18(5):529–36, May<br />
2011.<br />
[33] Y Kawarabayasi, Y H<strong>in</strong>o, H Horikawa, K J<strong>in</strong>-no, M Takahashi,<br />
M Sek<strong>in</strong>e, S Baba, A Ankai, H Kosugi, A Hosoyama,<br />
S Fukui, Y Nagai, K Nishijima, R Otsuka, H Nakazawa,<br />
M Takamiya, Y Kato, T Yoshizawa, T Tanaka, Y Kudoh,<br />
J Yamazaki, N Kushida, A Oguchi, K Aoki, S Masuda,
Bibliography 199<br />
M Yanagii, M Nishimura, A Yamagishi, T Oshima, and<br />
H Kikuchi. Complete genome sequence of an aerobic <strong>the</strong>rmoacidophilic<br />
crenarchaeon, sulfolobus tokodaii stra<strong>in</strong>7.<br />
DNA Res, 8(4):123–140, Aug 2001.<br />
[34] M Kessel and F Kl<strong>in</strong>k. Archaebacterial elongation factor is<br />
adp-ribosylated by diph<strong>the</strong>ria tox<strong>in</strong>. Nature, 287(5779):250–1,<br />
Sep 1980.<br />
[35] Eugene V Koon<strong>in</strong> and Kira S Makarova. Crispr-cas: an<br />
adaptive immunity <strong>system</strong> <strong>in</strong> prokaryotes. F1000 Biol Rep,<br />
1:95, Dec 2009.<br />
[36] V Kun<strong>in</strong>, R Sorek, and P Hugenholtz. Evolutionary conservation<br />
of sequence and secondary structures <strong>in</strong> crispr<br />
repeats. Genome Biol, 8(4), Apr 2007.<br />
[37] R K Lillestol, P Redder, R A Garrett, and K Brügger. A<br />
putative viral defence mechanism <strong>in</strong> archaeal cells. <strong>Archaea</strong>,<br />
2(1):59–72, Aug 2006.<br />
[38] R K Lillestol, S A Shah, K Brügger, P Redder, H Phan,<br />
J Christiansen, and R A Garrett. Crispr families of <strong>the</strong><br />
crenarchaeal genus sulfolobus: bidirectional transcription<br />
and dynamic properties. Mol Microbiol, 72(1):259–272, Apr<br />
2009.<br />
[39] Nathanael G L<strong>in</strong>tner, Mel<strong>in</strong>a Kerou, Susan K Brumfield,<br />
Shirley Graham, Huant<strong>in</strong>g Liu, James H Naismith, Mat<strong>the</strong>w<br />
Sdano, Nan Peng, Qunx<strong>in</strong> She, Valérie Copié, Mark J<br />
Young, Malcolm F White, and C Mart<strong>in</strong> Lawrence. Structural<br />
and functional characterization of an archaeal clustered<br />
regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat (crispr)associated<br />
complex for antiviral defense (cascade). J Biol<br />
Chem, 286(24):21643–56, Jun 2011.<br />
[40] G Lipps. Plasmids and viruses of <strong>the</strong> <strong>the</strong>rmoacidophilic<br />
crenarchaeote sulfolobus. Extremophiles, 10(1):17–28, Feb<br />
2006.<br />
[41] Li-Jun Liu, Xiao-Yan You, Huajun Zheng, Shengyue<br />
Wang, Cheng-Y<strong>in</strong>g Jiang, and Shuang-Jiang Liu. Complete<br />
genome sequence of metallosphaera cupr<strong>in</strong>a, a metal<br />
sulfide-oxidiz<strong>in</strong>g archaeon from a hot spr<strong>in</strong>g. J Bacteriol,<br />
193(13):3387–8, Jul 2011.
200 Bibliography<br />
[42] M Lundgren, A Andersson, L Chen, P Nilsson, and<br />
R Bernander. Three replication orig<strong>in</strong>s <strong>in</strong> sulfolobus species:<br />
synchronous <strong>in</strong>itiation of chromosome replication and asynchronous<br />
term<strong>in</strong>ation. Proc Natl Acad Sci U S A, 101(18):7046–<br />
7051, May 2004.<br />
[43] K S Makarova, N V Grish<strong>in</strong>, S A Shabal<strong>in</strong>a, Y I Wolf, and E V<br />
Koon<strong>in</strong>. A putative rna-<strong>in</strong>terference-based <strong>immune</strong> <strong>system</strong><br />
<strong>in</strong> prokaryotes: computational analysis of <strong>the</strong> predicted<br />
enzymatic mach<strong>in</strong>ery, functional analogies with eukaryotic<br />
rnai, and hypo<strong>the</strong>tical mechanisms of action. Biol Direct,<br />
1:7–7, 2006.<br />
[44] Kira S Makarova, L Arav<strong>in</strong>d, Yuri I Wolf, and Eugene V<br />
Koon<strong>in</strong>. Unification of cas prote<strong>in</strong> families and a simple<br />
scenario for <strong>the</strong> orig<strong>in</strong> and evolution of crispr-cas <strong>system</strong>s.<br />
Biol Direct, 6:38, 2011.<br />
[45] Kira S Makarova, Daniel H Haft, Rodolphe Barrangou, Stan<br />
J J Brouns, Emmanuelle Charpentier, Philippe Horvath,<br />
Sylva<strong>in</strong> Mo<strong>in</strong>eau, Francisco J M Mojica, Yuri I Wolf, Alexander<br />
F Yakun<strong>in</strong>, John van der Oost, and Eugene V Koon<strong>in</strong>.<br />
Evolution and classification of <strong>the</strong> crispr-cas <strong>system</strong>s. Nat<br />
Rev Microbiol, 9(6):467–77, Jun 2011.<br />
[46] Aron Marchler-Bauer, Shennan Lu, John B Anderson,<br />
Farideh Chitsaz, Myra K Derbyshire, Carol DeWeese-Scott,<br />
Jessica H Fong, Lewis Y Geer, Renata C Geer, Noreen R<br />
Gonzales, Marc Gwadz, David I Hurwitz, John D Jackson,<br />
Zhaoxi Ke, Christopher J Lanczycki, Fu Lu, Gabriele H<br />
Marchler, Mikhail Mullokandov, Mar<strong>in</strong>a V Omelchenko,<br />
Cynthia L Robertson, James S Song, Narmada Thanki, Roxanne<br />
A Yamashita, Dachuan Zhang, Naigong Zhang, Chanjuan<br />
Zheng, and Stephen H Bryant. Cdd: a conserved<br />
doma<strong>in</strong> database for <strong>the</strong> functional annotation of prote<strong>in</strong>s.<br />
Nucleic Acids Res, 39(Database issue):D225–9, Jan 2011.<br />
[47] L A Marraff<strong>in</strong>i and E J Son<strong>the</strong>imer. Crispr <strong>in</strong>terference limits<br />
horizontal gene transfer <strong>in</strong> staphylococci by target<strong>in</strong>g dna.<br />
Science, 322(5909):1843–1845, Dec 2008.<br />
[48] Luciano A Marraff<strong>in</strong>i and Erik J Son<strong>the</strong>imer. Self versus<br />
non-self discrim<strong>in</strong>ation dur<strong>in</strong>g crispr rna-directed immunity.<br />
Nature, Jan 2010.
Bibliography 201<br />
[49] F J Mojica, C Díez-Villaseñor, J García-Martínez, and C Almendros.<br />
Short motif sequences determ<strong>in</strong>e <strong>the</strong> targets of<br />
<strong>the</strong> prokaryotic crispr defence <strong>system</strong>. Microbiology, 155(Pt<br />
3):733–740, Mar 2009.<br />
[50] F J Mojica, C Díez-Villaseñor, J García-Martínez, and E Soria.<br />
Interven<strong>in</strong>g sequences of regularly spaced prokaryotic repeats<br />
derive from foreign genetic elements. J Mol Evol,<br />
60(2):174–182, Feb 2005.<br />
[51] F J Mojica, C Díez-Villaseñor, E Soria, and G Juez. Biological<br />
significance of a family of regularly spaced repeats <strong>in</strong><br />
<strong>the</strong> genomes of archaea, bacteria and mitochondria. Mol<br />
Microbiol, 36(1):244–246, Apr 2000.<br />
[52] Sab<strong>in</strong> Mulepati, Amberly Orr, and Scott Bailey. Crystal<br />
structure of <strong>the</strong> largest subunit of a bacterial rna-guided<br />
<strong>immune</strong> complex and its role <strong>in</strong> dna target b<strong>in</strong>d<strong>in</strong>g. J Biol<br />
Chem, 287(27):22445–9, Jun 2012.<br />
[53] Ki Hyun Nam, Charles Haitjema, Xueqi Liu, Fran D<strong>in</strong>g,<br />
Hongwei Wang, Mat<strong>the</strong>w P Delisa, and Ailong Ke. Cas5d<br />
prote<strong>in</strong> processes pre-crrna and assembles <strong>in</strong>to a cascadelike<br />
<strong>in</strong>terference complex <strong>in</strong> subtype i-c/dvulg crispr-cas<br />
<strong>system</strong>. Structure, 20(9):1574–84, Sep 2012.<br />
[54] Takuro Nunoura, Yoshihiro Takaki, Jungo Kakuta, Sh<strong>in</strong>ro<br />
Nishi, Junichi Sugahara, Hiromi Kazama, Gab-Joo Chee,<br />
Masahira Hattori, Akio Kanai, Haruyuki Atomi, Ken Takai,<br />
and Hideto Takami. Insights <strong>in</strong>to <strong>the</strong> evolution of archaea<br />
and eukaryotic prote<strong>in</strong> modifier <strong>system</strong>s revealed by <strong>the</strong> genome<br />
of a novel archaeal group. Nucleic Acids Res, 39(8):3204–<br />
23, Apr 2011.<br />
[55] Maija K Pietilä, El<strong>in</strong>a Ro<strong>in</strong>e, Lars Paul<strong>in</strong>, Nisse Kalkk<strong>in</strong>en,<br />
and Dennis H Bamford. An ssdna virus <strong>in</strong>fect<strong>in</strong>g archaea:<br />
a new l<strong>in</strong>eage of viruses with a membrane envelope. Mol<br />
Microbiol, 72(2):307–19, Apr 2009.<br />
[56] André Plagens, Britta Tjaden, Anna Hagemann, Lennart<br />
Randau, and Re<strong>in</strong>hard Hensel. Characterization of <strong>the</strong> crispr/cas<br />
subtype i-a <strong>system</strong> of <strong>the</strong> hyper<strong>the</strong>rmophilic crenarchaeon<br />
<strong>the</strong>rmoproteus tenax. J Bacteriol, 194(10):2491–500,<br />
May 2012.
202 Bibliography<br />
[57] C Pourcel, G Salvignol, and G Vergnaud. Crispr elements<br />
<strong>in</strong> yers<strong>in</strong>ia pestis acquire new repeats by preferential uptake<br />
of bacteriophage dna, and provide additional tools for<br />
evolutionary studies. Microbiology, 151(Pt 3):653–663, Mar<br />
2005.<br />
[58] D Prangishvili, P Forterre, and R A Garrett. Viruses of <strong>the</strong><br />
archaea: a unify<strong>in</strong>g view. Nat Rev Microbiol, 4(11):837–848,<br />
Nov 2006.<br />
[59] D Prangishvili, G Vestergaard, M Här<strong>in</strong>g, R Aramayo,<br />
T Basta, R Rachel, and R A Garrett. Structural and genomic<br />
properties of <strong>the</strong> hyper<strong>the</strong>rmophilic archaeal virus atv<br />
with an extracellular stage of <strong>the</strong> reproductive cycle. J Mol<br />
Biol, 359(5):1203–1216, Jun 2006.<br />
[60] G Pühler, H Leffers, F Gropp, P Palm, H P Klenk, F Lottspeich,<br />
R A Garrett, and W Zillig. Archaebacterial dnadependent<br />
rna polymerases testify to <strong>the</strong> evolution of <strong>the</strong><br />
eukaryotic nuclear genome. Proc Natl Acad Sci U S A,<br />
86(12):4569–73, Jun 1989.<br />
[61] Marco Punta, Penny C Coggill, Ruth Y Eberhardt, Ja<strong>in</strong>a<br />
Mistry, John Tate, Chris Boursnell, N<strong>in</strong>gze Pang, Kristoffer<br />
Forslund, Goran Ceric, Jody Clements, Andreas Heger, Liisa<br />
Holm, Erik L L Sonnhammer, Sean R Eddy, Alex Bateman,<br />
and Robert D F<strong>in</strong>n. The pfam prote<strong>in</strong> families database.<br />
Nucleic Acids Res, 40(Database issue):D290–301, Jan 2012.<br />
[62] P Redder, X Peng, K Brügger, S A Shah, F Roesch,<br />
B Greve, Q She, C Schleper, P Forterre, R A Garrett, and<br />
D Prangishvili. Four newly isolated fuselloviruses from<br />
extreme geo<strong>the</strong>rmal environments reveal unusual morphologies<br />
and a possible <strong>in</strong>terviral recomb<strong>in</strong>ation mechanism.<br />
Environ Microbiol, Jul 2009.<br />
[63] W D Reiter, P Palm, S Yeats, and W Zillig. Gene expression<br />
<strong>in</strong> archaebacteria: physical mapp<strong>in</strong>g of constitutive and uv<strong>in</strong>ducible<br />
transcripts from <strong>the</strong> sulfolobus virus-like particle<br />
ssv1. Mol Gen Genet, 209(2):270–5, Sep 1987.<br />
[64] M L Reno, N L Held, C J Fields, P V Burke, and R J Whitaker.<br />
Biogeography of <strong>the</strong> sulfolobus islandicus pan-genome. Proc<br />
Natl Acad Sci U S A, 106(21):8605–8610, May 2009.<br />
[65] Christ<strong>in</strong>e Rousseau, Jacques Nicolas, and Mathieu Gonnet.<br />
Crispi: a crispr <strong>in</strong>teractive database. Bio<strong>in</strong>formatics, Oct 2009.
Bibliography 203<br />
[66] Rachel Y Samson, Takayuki Obita, Stefan M Freund, Roger L<br />
Williams, and Stephen D Bell. A role for <strong>the</strong> escrt <strong>system</strong> <strong>in</strong><br />
cell division <strong>in</strong> archaea. Science, 322(5908):1710–3, Dec 2008.<br />
[67] Ekater<strong>in</strong>a Semenova, Matthijs M Jore, Kirill A Datsenko,<br />
Anna Semenova, Edze R Westra, Barry Wanner, John van der<br />
Oost, Stan J J Brouns, and Konstant<strong>in</strong> Sever<strong>in</strong>ov. Interference<br />
by clustered regularly <strong>in</strong>terspaced short pal<strong>in</strong>dromic repeat<br />
(crispr) rna is governed by a seed sequence. Proc Natl Acad<br />
Sci U S A, 108(25):10098–103, Jun 2011.<br />
[68] S A Shah, N R Hansen, and R A Garrett. Distribution of<br />
crispr spacer matches <strong>in</strong> viruses and plasmids of crenarchaeal<br />
acido<strong>the</strong>rmophiles and implications for <strong>the</strong>ir <strong>in</strong>hibitory<br />
mechanism. Biochem Soc Trans, 37(Pt 1):23–28, Feb 2009.<br />
[69] Shiraz A Shah and Roger A Garrett. Crispr/cas and cmr<br />
modules, mobility and evolution of adaptive <strong>immune</strong> <strong>system</strong>s.<br />
Res Microbiol, 162(1):27–38, Jan 2011.<br />
[70] Q She, R K S<strong>in</strong>gh, F Confalonieri, Y Zivanovic, G Allard,<br />
M J Awayez, C C Chan-Weiher, I G Clausen, B A Curtis,<br />
A De Moors, G Erauso, C Fletcher, P M Gordon, I Heikampde<br />
Jong, A C Jeffries, C J Kozera, N Med<strong>in</strong>a, X Peng, H P<br />
Thi-Ngoc, P Redder, M E Schenk, C Theriault, N Tolstrup,<br />
R L Charlebois, W F Doolittle, M Duguet, T Gaasterland, R A<br />
Garrett, M A Ragan, C W Sensen, and J Van der Oost. The<br />
complete genome of <strong>the</strong> crenarchaeon sulfolobus solfataricus<br />
p2. Proc Natl Acad Sci U S A, 98(14):7835–7840, Jul 2001.<br />
[71] Daan C Swarts, Cas Mosterd, Mark W J van Passel, and Stan<br />
J J Brouns. Crispr <strong>in</strong>terference directs strand specific spacer<br />
acquisition. PLoS One, 7(4):e35888, 2012.<br />
[72] T H Tang, J P Bachellerie, T Rozhdestvensky, M L Bortol<strong>in</strong>,<br />
H Huber, M Drungowski, T Elge, J Brosius, and A Hüttenhofer.<br />
Identification of 86 candidates for small nonmessenger<br />
rnas from <strong>the</strong> archaeon archaeoglobus fulgidus.<br />
Proc Natl Acad Sci U S A, 99(11):7536–7541, May 2002.<br />
[73] David L Valent<strong>in</strong>e. Adaptations to energy stress dictate <strong>the</strong><br />
ecology and evolution of <strong>the</strong> archaea. Nat Rev Microbiol,<br />
5(4):316–23, Apr 2007.<br />
[74] John van der Oost, Matthijs M Jore, Edze R Westra, Magnus<br />
Lundgren, and Stan J J Brouns. Crispr-based adaptive
204 Bibliography<br />
and heritable immunity <strong>in</strong> prokaryotes. Trends Biochem Sci,<br />
34(8):401–7, Aug 2009.<br />
[75] G Vestergaard, S A Shah, A Bize, W Reitberger, M Reuter,<br />
H Phan, A Briegel, R Rachel, R A Garrett, and D Prangishvili.<br />
Stygiolobus rod-shaped virus and <strong>the</strong> <strong>in</strong>terplay of crenarchaeal<br />
rudiviruses with <strong>the</strong> crispr antiviral <strong>system</strong>. J Bacteriol,<br />
190(20):6837–6845, Oct 2008.<br />
[76] Michaela Wagner, Silvia Berkner, Malgorzata Ajon, Arnold<br />
J M Driessen, Georg Lipps, and Sonja-Verena Albers. Expand<strong>in</strong>g<br />
and understand<strong>in</strong>g <strong>the</strong> genetic toolbox of <strong>the</strong> hyper<strong>the</strong>rmophilic<br />
genus sulfolobus. Biochem Soc Trans, 37(Pt<br />
1):97–101, Feb 2009.<br />
[77] F<strong>in</strong>n Werner and D<strong>in</strong>a Grohmann. Evolution of multisubunit<br />
rna polymerases <strong>in</strong> <strong>the</strong> three doma<strong>in</strong>s of life. Nat Rev<br />
Microbiol, 9(2):85–98, Feb 2011.<br />
[78] Edze R Westra, Benedikt Nilges, Paul B G van Erp, John<br />
van der Oost, Remus T Dame, and Stan J J Brouns. Cascademediated<br />
b<strong>in</strong>d<strong>in</strong>g and bend<strong>in</strong>g of negatively supercoiled<br />
dna. RNA Biol, 9(9), Sep 2012.<br />
[79] Edze R Westra, Paul B G van Erp, Tim Künne, Shi Pey Wong,<br />
Raymond H J Staals, Christel L C Seegers, Sander Bollen,<br />
Matthijs M Jore, Ekater<strong>in</strong>a Semenova, Konstant<strong>in</strong> Sever<strong>in</strong>ov,<br />
Willem M de Vos, Remus T Dame, Renko de Vries, Stan<br />
J J Brouns, and John van der Oost. Crispr immunity relies<br />
on <strong>the</strong> consecutive b<strong>in</strong>d<strong>in</strong>g and degradation of negatively<br />
supercoiled <strong>in</strong>vader dna by cascade and cas3. Mol Cell,<br />
46(5):595–605, Jun 2012.<br />
[80] Blake Wiedenheft, Gabriel C Lander, Kaihong Zhou, Matthijs<br />
M Jore, Stan J J Brouns, John van der Oost, Jennifer A<br />
Doudna, and Eva Nogales. Structures of <strong>the</strong> rna-guided surveillance<br />
complex from a bacterial <strong>immune</strong> <strong>system</strong>. Nature,<br />
477(7365):486–9, Sep 2011.<br />
[81] Blake Wiedenheft, Es<strong>the</strong>r van Duijn, Jelle B Bultema, Jelle<br />
Bultema, Sakharam P Waghmare, Sakharam Waghmare,<br />
Kaihong Zhou, Arjan Barendregt, Wiebke Westphal, Albert<br />
J R Heck, Albert Heck, Egbert J Boekema, Egbert Boekema,<br />
Mark J Dickman, Mark Dickman, and Jennifer A Doudna.
Bibliography 205<br />
Rna-guided complex from a bacterial <strong>immune</strong> <strong>system</strong> enhances<br />
target recognition through seed sequence <strong>in</strong>teractions.<br />
Proc Natl Acad Sci U S A, 108(25):10092–7, Jun 2011.<br />
[82] C R Woese and G E Fox. Phylogenetic structure of <strong>the</strong><br />
prokaryotic doma<strong>in</strong>: <strong>the</strong> primary k<strong>in</strong>gdoms. Proc Natl Acad<br />
Sci U S A, 74(11):5088–90, Nov 1977.<br />
[83] C R Woese, O Kandler, and M L Wheelis. Towards a natural<br />
<strong>system</strong> of organisms: proposal for <strong>the</strong> doma<strong>in</strong>s archaea,<br />
bacteria, and eucarya. Proc Natl Acad Sci U S A, 87(12):4576–<br />
9, Jun 1990.<br />
[84] Ido Yosef, Moran G Goren, and Udi Qimron. Prote<strong>in</strong>s and<br />
dna elements essential for <strong>the</strong> crispr adaptation process <strong>in</strong><br />
escherichia coli. Nucleic Acids Res, 40(12):5569–76, Jul 2012.<br />
[85] Xiao-Yan You, Chao Liu, Sheng-Yue Wang, Cheng-Y<strong>in</strong>g Jiang,<br />
Shiraz A Shah, David Prangishvili, Qunx<strong>in</strong> She, Shuang-<br />
Jiang Liu, and Roger A Garrett. Genomic analysis of acidianus<br />
hospitalis w1 a host for study<strong>in</strong>g crenarchaeal virus<br />
and plasmid life cycles. Extremophiles, 15(4):487–97, Jul 2011.<br />
[86] J<strong>in</strong>g Zhang, Christophe Rouillon, Mel<strong>in</strong>a Kerou, Judith<br />
Reeks, Kim Brugger, Shirley Graham, Julia Reimann,<br />
Giuseppe Cannone, Huant<strong>in</strong>g Liu, Sonja-Verena Albers,<br />
James H Naismith, Laura Spagnolo, and Malcolm F White.<br />
Structure and mechanism of <strong>the</strong> cmr complex for crisprmediated<br />
antiviral immunity. Mol Cell, 45(3):303–13, Feb<br />
2012.