10.07.2015 Views

Guide to data resources - European Bioinformatics Institute

Guide to data resources - European Bioinformatics Institute

Guide to data resources - European Bioinformatics Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The EMBL-<strong>European</strong> <strong>Bioinformatics</strong> <strong>Institute</strong><strong>Guide</strong> <strong>to</strong> <strong>data</strong> <strong>resources</strong>


Copyright © 2012 EMBL-<strong>European</strong> <strong>Bioinformatics</strong> <strong>Institute</strong>Cover image courtesy of Samuel Kerrien


The <strong>European</strong> <strong>Bioinformatics</strong> <strong>Institute</strong><strong>Guide</strong> <strong>to</strong> <strong>data</strong> <strong>resources</strong>ContentsAbout this guide 2Genes and genomes 2Gene expression 3Proteins and proteomics 4Macromolecular structures 4Small molecules 5Enzymes and reactions 5Interactions, pathways and networks 6Patent sequences 6Literature and text mining 7On<strong>to</strong>logies 7Tools 8


About usThe EMBL-<strong>European</strong> <strong>Bioinformatics</strong><strong>Institute</strong> (EMBL-EBI) maintains oneof the world’s most comprehensivecollections of freely availablebiological <strong>data</strong>bases. Our free <strong>to</strong>olsallow you <strong>to</strong> analyse this informationand share it with the researchcommunity.Genes and genomesNext-generation sequencing has transformed the task of collecting DNAsequence <strong>data</strong>. We no longer sequence just single genomes but also sets ofrelated genomes in structured studies. This has made possible the 1000Genomes Project, an international initiative <strong>to</strong> build a comprehensivecatalogue of all common human variation. We also capture genetic variationof medical relevance. Such <strong>data</strong> require careful attention <strong>to</strong> access control,ensuring that only authorised scientists can use them, and that they do so in away that reflects the spirit of consent and confidentiality.About this guideThis document provides an overviewof EMBL-EBI’s core <strong>resources</strong>. Youcan find overviews of these <strong>resources</strong>in Train online.www.ebi.ac.uk/training/onlineSupportDetailed help documentationis available through each of the<strong>resources</strong> described here. You canalso find answers <strong>to</strong> some of yourquestions through our homepage:www.ebi.ac.uk/helpTrainingEMBL-EBI offers training at alllevels. Our hands-on training coursesare held regularly on the GenomeCampus in Hinx<strong>to</strong>n, UK and at hostinstitutions throught the world.Train online guides you throughmany bioinformatics <strong>resources</strong>, inyour own time and at your own pace.www.ebi.ac.uk/trainingStay in <strong>to</strong>uchYou can find EMBL-EBI news, serviceupdates and training opportunities onour website, Twitter and Facebook.www.ebi.ac.uk@emblebi/EMBLEBIEnsemblwww.ensembl.orgEnsembl Genomeswww.ensemblgenomes.org<strong>European</strong> NucleotideArchivewww.ebi.ac.uk/ena<strong>European</strong> GenomephenomeArchivewww.ebi.ac.uk/egaMetagenomics Portalwww.ebi.ac.uk/metagenomicsEnsembl produces and maintains both au<strong>to</strong>maticand manually curated annotation on selectedeukaryotic genomes. Au<strong>to</strong>matic annotationis based on mRNA and protein information.Ensembl provides valuable insights in<strong>to</strong> variationwithin and between species, and allows you <strong>to</strong>compare whole genomes <strong>to</strong> identify conservedelements. It is integrated with several otherimportant molecular <strong>resources</strong>, for exampleUniProt, and can be accessed programmatically.Ensembl is developed as a joint project betweenEMBL-EBI and the Wellcome TrustSanger <strong>Institute</strong>.Ensembl Genomes is a portal <strong>to</strong> genome-scale<strong>data</strong> from bacteria, protists, fungi, plants andinvertebrate metazoa, through a unified se<strong>to</strong>f interactive and programmatic interfaces.Domain-centric <strong>resources</strong> (focused on particularareas of scientific interest) are developed incollaboration with the appropriate scientificcommunities, and integrated in the broadtaxonomic context through comparative analysisand standardised annotation.ENA, a member of the International SequenceDatabase Collaboration, contains all thenucleotide sequences in the public domainand consolidates <strong>data</strong> from EMBL-Bank, the<strong>European</strong> Trace Archive and the Sequence ReadArchive. The EBI Search returns results fromselected annotated sections of the ENA.EGA allows you <strong>to</strong> explore <strong>data</strong>sets fromnumerous genotype experiments – includingcase-control, population and family studies –that are supplied by a range of <strong>data</strong> providers.Our Metagenomics service is an au<strong>to</strong>matedpipeline for the analysis and archiving ofmetagenomic <strong>data</strong>, and provides insights in<strong>to</strong> thefunctional and metabolic potential of a sample.2Search www.ebi.ac.ukOur search service is your gateway <strong>to</strong> a world of informationspanning DNA, genes, functional genomics, proteins, structures,small molecules, enzymes, interactions, pathways, scientificpublications and patent sequences.


Gene expressionGenome-wide expression assays, originally using microarrays and morerecently high-throughput sequencing, can answer specific biological questionsand provide reference <strong>data</strong> sets. The associated large-scale expression <strong>data</strong>setscan be used <strong>to</strong> answer questions that might not be related <strong>to</strong> the study forwhich the <strong>data</strong> were originally generated. For example, a gene expressionstudy that reveals differentially expressed genes characteristic of a particulartype of cancer may also reveal candidate genes for therapeutics development.EMBL-EBI’s functional genomics <strong>resources</strong> provide easy access <strong>to</strong> geneexpression <strong>data</strong> and related information. Our visualisation <strong>to</strong>ols collateinformation from multiple <strong>data</strong> sets and present it in an intuitive way.ArrayExpresswww.ebi.ac.uk/arrayexpressExpression Atlaswww.ebi.ac.uk/gxaR-Workbenchwww.ebi.ac.uk/Tools/rcloudThe MIAME-standard compliant ArrayExpressArchive s<strong>to</strong>res functional genomics experimentsperformed using RNA-Seq/ChIP-Seq andarray-based technologies.The Expression Atlas allows you <strong>to</strong> search forgene expression changes measured in various celltypes, organism parts, disease states and manyother biological and experimental conditions.It represents a curated subset of the ArrayExpressAchive experiments, and can be downloaded forlocal use.The R-Workbench provides access <strong>to</strong> R andBioconduc<strong>to</strong>r in the EBI cloud, and rapid access<strong>to</strong> <strong>data</strong> s<strong>to</strong>red in EMBL-EBI <strong>data</strong>bases such asArrayExpress. You can upload your own <strong>data</strong> anduse EMBL-EBI’s infrastructure <strong>to</strong> perform your<strong>data</strong> analyses.Leanne,discovery biologistLeanne works on discovery andvalidation of candidate drugtargets in a pharmaceuticalcompany. She explains: ‘I do a lo<strong>to</strong>f cloning and expression analysis.I need <strong>to</strong> find published cDNAsequences and their orthologues, sothat I can find appropriate modelsin which <strong>to</strong> validate my targets.For discovery, I’d like a simple way<strong>to</strong> search across all public-domainbiological information. A colleaguementioned a gene <strong>to</strong> me in passingrecently and I only caught thename of it. I’d like <strong>to</strong> be able <strong>to</strong> findout more about it without having<strong>to</strong> forage around in lots ofdifferent publications.’EBI <strong>resources</strong> forLeanne:EBI Search for quick summaryreports on specific genes.BLAST <strong>to</strong> root outvec<strong>to</strong>r contamination.The Expression Atlas fortranscription profiles andfunctional genomics.UniProt <strong>to</strong> learn more aboutprotein function.ChEMBL for druggability oftarget molecules.3


Francesco,marine biologistProteins and proteomicsOur protein and nucleotide <strong>data</strong> are extensively linked, both at the level ofunderlying <strong>data</strong> and through the coordination of web <strong>resources</strong>. This helps uscapture, organise and interpret sequence-related <strong>data</strong>, providing informationfreely in a variety of formats including user-friendly web sites. Whereverpossible, we collaborate with other groups worldwide with similar aims.Francesco is Head of Departmentin a new collaborative centrefor marine biology. His group isdoing metagenomic studies ofextreme marine environments buthis department has just securedfunding <strong>to</strong> look for novel woundhealingcompounds in marinealgae. “We’re about <strong>to</strong> scale up oursequencing efforts severalfold. Weneed <strong>to</strong> put some pipelines in placethat will enable us <strong>to</strong> make sense ofall the <strong>data</strong> that we’re generating.The extremophile project is acollaborative effort, and we want <strong>to</strong>be able <strong>to</strong> share information withour research network.”EBI <strong>resources</strong> forFrancesco:<strong>European</strong> Nucleotide Archivefor DNA barcodes <strong>to</strong> identifyorganisms in his samples.His team could set up a pipeline<strong>to</strong> submit their <strong>data</strong> <strong>to</strong> ENA andmake it publicly available.Ensembl and Ensembl Genomes<strong>to</strong> characterise organisms bycomparing them with a wealth ofinformation on the genomes ofmany fully sequenced organisms.InterProScan <strong>to</strong> help identifydomains and motifs in the novelsequences he is generating thatmay imply specific functions.UniMes, a reposi<strong>to</strong>ry specificallydeveloped for metagenomic andenvironmental <strong>data</strong>.Metagenomics Portal forpowerful analysis of samplescontaining a variety of organisms.Web Services <strong>to</strong> helpcreate pipelines.UniProtwww.uniprot.orgInterProwww.ebi.ac.uk/interproPRIDEwww.ebi.ac.uk/prideMacromolecular structuresThree-dimensional structures give us mechanistic insight in<strong>to</strong> howmacromolecules work, and help explain how their functions are disrupted bymutation or interaction with small molecules. Our structural <strong>data</strong> <strong>resources</strong>are engineered <strong>to</strong> reflect the needs of a growing and diverse structuralbiology community.PDBewww.ebi.ac.uk/pdbePDBsumwww.ebi.ac.uk/pdbsumProFuncwww.ebi.ac.uk/profuncThis comprehensive resource organises proteinsequence information so that you can find splicevariants, functional information and links <strong>to</strong> related<strong>resources</strong> from a single entry. UniProt is a jointeffort with the Swiss <strong>Institute</strong> of Bioinfomatics andGeorge<strong>to</strong>wn University. It comprises four components:the UniProt Knowledgebase (UniProtKB),Reference Clusters (UniRef), Archive (UniParc)and Metagenomic and Environmental Sequences(UniMES). UniProtKB is the hub for the collectionof functional information on proteins, with accurate,consistent and rich annotation.InterPro classifies proteins in<strong>to</strong> families and predictsthe presence of important domains and sites.InterProScan allows you <strong>to</strong> query your sequenceagainst the <strong>data</strong>base.The Proteomics Identifications Database is acentralised, standards-compliant, public reposi<strong>to</strong>ryfor proteomics <strong>data</strong>. It contains protein and peptideidentifications and their associated supportingevidence. PRIDE also captures details ofpost-translational modifications.The Protein Data Bank in Europe (PDBe), part of theWorldwide Protein Data Bank (wwPDB), allows you<strong>to</strong> search all the macromolecular structures in thepublic domain. Its <strong>to</strong>ols help you perform sophisticatedanalyses of the <strong>data</strong>, including sub-structure searchesand structural comparisons. You can analyse detailedmolecular interactions and correlate them withsequence or structure patterns.This pic<strong>to</strong>rial <strong>data</strong>base provides an overview ofmacromolecular structures deposited in the ProteinData Bank archive.ProFunc allows you <strong>to</strong> identify the likely biochemcialfunction of a protein from its 3D structure.4


Small moleculesJohn,medicinal chemistEMBL-EBI’s cheminformatics <strong>resources</strong> help bridge the gap between smallmolecules and the macromolecules with which they interact in living systems.In addition <strong>to</strong> the <strong>resources</strong> listed below, PDBe offers small molecule servicesthat are associated with macromolecular structures.ChEBIwww.ebi.ac.uk/chebiChEMBLwww.ebi.ac.uk/chemblRESIDwww.ebi.ac.uk/RESIDMetaboLightswww.ebi.ac.uk/metabolightsChemical Entities of Biological Interest (ChEBI)is a manually annotated dictionary of smallmolecular entities, for example any constitutionallyor iso<strong>to</strong>pically distinct a<strong>to</strong>m, molecule, ion orradical. It provides a wide range of related chemicalinformation such as formulae, links <strong>to</strong> other<strong>data</strong>bases and a controlled vocabulary that describesthe ‘chemical space’.ChEMBL, a <strong>data</strong>base of bioactive compounds,focuses on interactions between small molecules andtheir macromolecular targets, including medicinalchemistry, clinical development and therapeutics<strong>data</strong>. Pharmaceutically important gene families inChEMBL can be viewed in the GPCR and KinaseSARfari web portals.This collection of annotations and structures forprotein modifications includes amino-terminal,carboxy-terminal and peptide chainpost-translational modifications.MetaboLights is a <strong>data</strong>base for metabolomicsexperiments and derived information. It iscross-species, cross-technique and covers metabolitestructures and their reference spectra as well as theirbiological roles, locations and concentrations, andexperimental <strong>data</strong> from metabolic experiments.Enzymes and reactionsThe Enzyme Portal helps you explore the biology of enzymes and proteinswith enzymatic activity. It integrates publicly available information aboutenzymes, such as small-molecule chemistry, biochemical pathways and drugcompounds. The Enzyme Portal summarises information from:••The UniProt knowledge base;••The Protein Data Bank in Europe;••Rhea (www.ebi.ac.uk/rhea), a <strong>data</strong>base of enzyme-catalysed reactions;••Reac<strong>to</strong>me, a <strong>data</strong>base of biochemical pathways;••IntEnz (www.ebi.ac.uk/IntEnz), a resource with enzymenomenclature information;••ChEBI and ChEMB (see above);••Cofac<strong>to</strong>r (www.ebi.ac.uk/thorn<strong>to</strong>n-srv/<strong>data</strong>bases/CoFac<strong>to</strong>r) andMACiE (www.ebi.ac.uk/thorn<strong>to</strong>n-srv/<strong>data</strong>bases/MACiE) forhighly detailed, curated information about cofac<strong>to</strong>rs andreaction mechanisms.John is a medicinal chemistin a large, multinationalpharmaceutical company.“I’m constantly looking for newinformation, so searchable<strong>data</strong>bases of existing chemicalliterature are very useful,” he says.“For hit and lead optimisation,new ideas that could get mein<strong>to</strong> novel chemical space wouldbe great. Information on mypharmacophore would be reallyuseful <strong>to</strong>o; things like: is itlikely <strong>to</strong> be <strong>to</strong>xic? Or, can wepredict if analogues will bebiologically active?”EBI <strong>resources</strong> for John:ChEMBL for detailedinformation about over a millionbioactive compounds.Kinase SARfari, an integratedchemogenomics workbenchlinking kinase sequences withtheir structures, compounds thattarget them and screening <strong>data</strong>.GPCR SARfari for G-proteincoupledrecep<strong>to</strong>rs.ChEBI for information aboutsmall molecular entities andlinks <strong>to</strong> the macromolecules withwhich they interact.PDBe for 3D structures ofmacromolecules, includingprotein–ligand interactions.The Enzyme Portal covers a large number of species including the key modelorganisms, and provides a simple way <strong>to</strong> compare orthologues.EMBL-EBI also hosts the Catalytic Site Atlas, a resource of catalytic sites andresidues that have been identified in enzymes using structural <strong>data</strong>:www.ebi.ac.uk/thorn<strong>to</strong>n-srv/<strong>data</strong>bases/CSA5


Leon,postdocInteractions, pathways& networksComputational systems biology takes us beyond the identification of molecular‘parts lists’ for living organisms and <strong>to</strong>wards synthesising information <strong>to</strong>generate and test new hypotheses about how biological systems work.Leon is a postdoc working onquorum sensing in bacteria. Hewants <strong>to</strong> understand what makesa normally harmless bacteriumpathogenic in the lungs of peoplewith cystic fibrosis. “I’m using acombination of transcrip<strong>to</strong>mics,proteomics and metabolomics<strong>to</strong> understand these pathogenicchanges better,” he says. “I end upwith big spreadsheets of protein orgene IDs and I’m trying <strong>to</strong> piece<strong>to</strong>gether which signalling pathwaysare involved in flipping <strong>to</strong> thepathogenic state. When I comeacross a gene or protein that’s notfamiliar <strong>to</strong> me, I want <strong>to</strong> be able <strong>to</strong>find out about it quickly and putit in the context of other genes orproteins that are expressed at thesame time.”EBI <strong>resources</strong> for Leon:EBI Search for quick summaryreports of genes or proteins.Enzyme Portal <strong>to</strong> learn moreabout a particular enzyme thatswitches a pathway on or off.InterPro for uncharacterisedproteins or novel protein familiesand domains.UniProt for the function ofproteins in his lists of IDs.Gene On<strong>to</strong>logy <strong>to</strong> find outwhich ones are involved in thesame processes.Reac<strong>to</strong>me <strong>to</strong> map gene andprotein lists <strong>to</strong> pathways.Leon could also submit histranscrip<strong>to</strong>mics and proteomicsexperiments <strong>to</strong> ArrayExpressand PRIDE <strong>to</strong> analyse his resultsin the context of other similarexperiments.IntActwww.ebi.ac.uk/intactReac<strong>to</strong>mewww.reac<strong>to</strong>me.orgBioModelswww.ebi.ac.uk/biomodelsPATENTPatent sequencesThe IntAct molecular interaction <strong>data</strong>base and analysissystem provides a central, standards-compliant reposi<strong>to</strong>ryof molecular interactions, including protein–protein,protein–small molecule and protein–nucleicacid interactions.Reac<strong>to</strong>me is a <strong>data</strong>base of human biological pathways builtfrom connected reactions that encompass all biologicalevents as well as classical biochemical events. Contentis authored by expert biologists and linked <strong>to</strong> scientificliterature that experimentally validates the steps in apathway. The goal is <strong>to</strong> represent the current biologicalconsensus of human pathways, as a free online referenceand a downloadable core <strong>data</strong>set. Reac<strong>to</strong>me offers <strong>to</strong>ols forsearching and viewing pathways as interactive diagrams,and for analysing <strong>data</strong>sets by comparing them with otherpathways and overlaying expression or interaction <strong>data</strong>.Biomodels is a <strong>data</strong>base of published mathematical modelsdescribing biological processes. You can simulate themodels online and use them <strong>to</strong> develop your own. Biologistscan s<strong>to</strong>re, search and retrieve published and annotatedmathematical models of biological interest.We offer free, unrestricted access <strong>to</strong> dedicated patent sequence <strong>data</strong>basesthat contain valuable information about patent family <strong>data</strong>, patentequivalents and ‘earliest priority’ dates. These <strong>data</strong>bases also offer access <strong>to</strong>full-text patent documents.Our non-redundant patent sequence <strong>data</strong>bases help you search efficientlyand return de-duplicated results, saving valuable time and effort.Patent nucleotide and protein sequences are contributed by the <strong>European</strong>Patent Office (EPO), US Patent and Trademark Office, and the Japanese(JPO) and Korean (KIPO) patent offices, including World IntellectualProperty Organization (WIPO)-approved patents from these offices.Further agreements with patent offices from Brazil, Canada, China andMexico have been achieved or are in progress.www.ebi.ac.uk/patent<strong>data</strong>/nr6


Literature and text miningJudith,plant geneticistWe develop free public <strong>resources</strong> based on the scientific literature, and leadthe development of UK PubMed Central. Our <strong>resources</strong> provide links <strong>to</strong>full-text articles and maintain cross-references <strong>to</strong> biological <strong>data</strong>bases such asUniProt, ENA, PDBe and InterPro. The CiteXplore search engine integratestext-mining functions <strong>to</strong> build links <strong>to</strong> biological concepts and the underlying<strong>data</strong>bases of proteins, genes and gene on<strong>to</strong>logy.CiteXplorewww.ebi.ac.uk/citexploreUK PubMed Centralhttp://ukpmc.ac.ukOn<strong>to</strong>logiesCiteXplore covers 25 million abstracts inlcudingPubMed, Agricola and patents. It combinesliterature search with text-mining <strong>to</strong>ols,citation networks and cross references <strong>to</strong> otherbioinformatics <strong>resources</strong>.UKPMC contains over 2 million full text lifescience research articles, of which around 250 000are open access. Combined with the CiteXplorecontent and functions listed above, it providesintegrated text-mining <strong>to</strong>ols as well asgrant-reporting services.The wide variations in terminology between different research communitiespose a challenge both for the researcher conducting a search and thecomputer executing it. On<strong>to</strong>logy services help identify the most usefulinformation by identifying functionally equivalent terms.Judith works for a plant-breedinginstitute that is using genomics <strong>to</strong>identify new crop strains that areresistant <strong>to</strong> drought, salt and fungaldiseases. “We’re doing linkagestudies <strong>to</strong> find out which genes areinvolved in resistance <strong>to</strong> differenttypes of stress,” she explains. “Someof the crops we work on haven’tbeen sequenced yet. We’ve got a lo<strong>to</strong>f <strong>data</strong>, both on genomic variantsand on differences in expressionlevels, that we need <strong>to</strong> map on <strong>to</strong>well-characterised plants <strong>to</strong> makesense of it. We also need <strong>to</strong> find outwhether any of our sequences haveassociated patents so that we don’twaste time and effort.”Gene On<strong>to</strong>logywww.geneon<strong>to</strong>logy.orgGO Annotationwww.ebi.ac.uk/QuickGoChEBIwww.ebi.ac.uk/ChEBISystems BiologyOn<strong>to</strong>logieswww.ebi.ac.uk/sboExperimentalFac<strong>to</strong>r On<strong>to</strong>logywww.ebi.ac.uk/EFOOn<strong>to</strong>logy LookupServicewww.ebi.ac.uk/on<strong>to</strong>logy-lookupThe Gene On<strong>to</strong>logy Consortium provides a controlledvocabulary <strong>to</strong> describe gene and gene product attributesin any organism.GOA provides Gene On<strong>to</strong>logy assignments for proteinsfor all species in the UniProt Knowledge Base.The ChEBI chemistry on<strong>to</strong>logy provides a controlledvocabulary <strong>to</strong> describe small molecules, including theirbiological and chemical roles.The SBO project develops controlled vocabularies andon<strong>to</strong>logies for problems in systems biology. SBO is par<strong>to</strong>f BioModels.net.EFO is used <strong>to</strong> describe experimental variables infunctional genomics experiments, and is crossreferenced<strong>to</strong> more than 20 species or domain-specificterminologies. It provides a way <strong>to</strong> integrate <strong>data</strong>basesusing different terminologies and is deployed insearching for the ArrayExpress Archive and ExpressionAtlas <strong>data</strong>bases.OLS is a centralised query interface for on<strong>to</strong>logy andcontrolled vocabulary lookup. It features interactivegraphics that clarify the relationships between terms.EBI <strong>resources</strong> forJudith:Ensembl Genomes forcomparative analysis andorthologue prediction.ArrayExpress <strong>to</strong> find relatedfunctional genomics <strong>data</strong>. Judithcould also submit her expression<strong>data</strong> <strong>to</strong> ArrayExpress.Expression Atlas <strong>to</strong> see whichgenes are expressed undersimilar conditions.Gene On<strong>to</strong>logy for matchingprocesses and functions <strong>to</strong> plantgenes and their products.Reac<strong>to</strong>me <strong>to</strong> map genes<strong>to</strong> pathways.The EPO-EBI Non-redundantpatent <strong>data</strong>bases <strong>to</strong> search forprior art.7


Ola,clinical researcherToolsWe provide a comprehensive range of bioinformatics <strong>to</strong>ols. Here wehighlight a selection of those used most frequently, but you can find a moreextended list on our website (www.ebi.ac.uk/Tools).For programmatic access, take a look at our Web Services (SOAP or RESTAPIs): www.ebi.ac.uk/Tools/webservicesOla is an MD/PhD. When she’s notpractising as an oncologist, she’sworking on her research project.“I’m looking for proteomics-basedbiomarkers for the early detectionof bladder cancer, specificallypeptides from transitional-cellcarcinomas that are shed in<strong>to</strong> theurine,” she explains. “I do mass specof samples from patients coming infor biopsies. I need <strong>to</strong> identify theproteins whose production is up- ordown-regulated. I’ve started lookingat post-translational modificationson these proteins, and have founda phosphoprotein that seems <strong>to</strong> beupregulated in some patients. I’dlike <strong>to</strong> know more about the cellularprocesses leading <strong>to</strong> this change.This might help us find noveltherapies, <strong>to</strong>o.”EBI <strong>resources</strong> for Ola:PRIDE for proteomicsexperiments that have identifiedproteins from carcinoma cells.Ola could also submit her ownresearch results <strong>to</strong> PRIDE.UniProt <strong>to</strong> find out about thephosphorylation sites on hernew biomarker and whattype of kinase might catalysethese changes.IntAct <strong>to</strong> find evidence of otherproteins that interact with herprotein of interest.Nucleotide sequence searchYou can use these <strong>to</strong>ols <strong>to</strong> searchnucleotide sequence <strong>data</strong>bases,including ENA and <strong>resources</strong> for codingsequences, immunoglobulins and highthroughputsequences:••BLAST Nucleotide••ENA Sequence Search(Exonerate)Multiple sequence alignmentFor alignment of three or moresequences <strong>to</strong> identify regions ofconservation, which may indicatefunctional constraints and inferevolutionary relationships:••Clustal Omega••ClustalW2••WebPRANK••T-Coffee••MUSCLEProtein functional analysisFor identification of potential proteinfamilies, domains and functional sitesallowing the the function of a proteinsequence <strong>to</strong> be predicted:••InterProScan••Pratt••Phobius••RADARMolecular structure analysisThese <strong>to</strong>ols help you determine andvisualise macromolecular structures:••PDBeFold••QuaternaryStructureProtein sequence searchYou can use these <strong>to</strong>ols <strong>to</strong> searchprotein sequence <strong>data</strong>bases includingUniProtKB, sequences derivedfrom macromolecular structures,immunoglobulins and sequencesfrom patents:••BLAST Protein••PSI-SearchPairwise sequence alignmentFor alignment of two sequences <strong>to</strong>identify regions where the sequence isconserved:••Needle••LAlignFunctional genomics <strong>to</strong>olsTo explore <strong>data</strong> from gene expressionexperiments:••Expression Atlas••EFO Tools••EBI R-WorkbenchText miningFor analysis of text documents <strong>to</strong>identify important terms and relate them<strong>to</strong> <strong>data</strong>base entries and on<strong>to</strong>logies:••EBIMed••WhatizitData retrieval and ID mapping••dbfetch••ENA / SRA••ENA / EMBL-SVA••UniProt / UniSave••SRS••PICR8


The <strong>European</strong> Molecular Biology Labora<strong>to</strong>ry


EMBL-<strong>European</strong> <strong>Bioinformatics</strong> <strong>Institute</strong>Wellcome Trust Genome CampusHinx<strong>to</strong>n, CambridgeCB10 1SD, UKTel: +44 (0) 1223 494 444Fax: +44 (0) 1223 494 468E-mail: outreach@ebi.ac.ukWeb: www.ebi.ac.ukTwitter: @emblebiFacebook: /EMBLEBIFind this brochure online: www.ebi.ac.uk/information/brochures

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!