12.07.2015 Views

A Special Issue on Data Standards Foreword - Centre for Ecology ...

A Special Issue on Data Standards Foreword - Centre for Ecology ...

A Special Issue on Data Standards Foreword - Centre for Ecology ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OMICS A Journal of Integrative BiologyVolume 10, Number 2, 2006©Mary Ann Liebert, Inc.<strong>Foreword</strong>A <str<strong>on</strong>g>Special</str<strong>on</strong>g> <str<strong>on</strong>g>Issue</str<strong>on</strong>g> <strong>on</strong> <strong>Data</strong> <strong>Standards</strong>DAWN FIELD 1 and SUSANNA-ASSUNTA SANSONE 2ABSTRACTThis special issue arose as a resp<strong>on</strong>se to the <strong>on</strong>going proliferati<strong>on</strong> of grass roots, communitydriveninternati<strong>on</strong>al data standardizati<strong>on</strong> activities. It is our hope that giving a range of participantsin such activities the opportunity to c<strong>on</strong>tribute up-to-date descripti<strong>on</strong>s of their ef<strong>for</strong>ts willhelp foster dialogue, raise awareness, and make a call <strong>for</strong> acti<strong>on</strong> <strong>on</strong> behalf of this rapidly growingcommunity. This issue is not an exhaustive review of activities, but provides a snapshot of arange of activities at different stages of maturity and their growing interacti<strong>on</strong>s. It is organizedinto four groups of papers <strong>on</strong> data standardizati<strong>on</strong> activites in the fields of genomics, the postgenomictechnologies of transcriptomics, proteomics and metabolomics and integrative activites,and other initiatives. Each invited piece c<strong>on</strong>tains details of the current status, future prospects,and successes and challenges of these ef<strong>for</strong>ts, making this issue a resource <strong>for</strong> those wishing totrack, participate, or c<strong>on</strong><strong>for</strong>m to any of these standardizati<strong>on</strong> activities, as well as those wishingto initiate new activities. In this <strong>for</strong>eword, we provide a brief background to the practice andmethodology being adopted in the development of OMICS standards, review the c<strong>on</strong>tent of thisspecial issue, and attempt to highlight the growing interacti<strong>on</strong>s and synergies between groups.INTRODUCTIONTHE INTRODUCTION of OMICS technologies into the life sciences has heralded tremendous opportunitiesbut also significant challenges. Am<strong>on</strong>g these recognized challenges is the need to manage the vast quantitiesof raw data generated by these high-throughput methods in an effective manner such that it can beproperly analyzed and re-analyzed, scrutinized <strong>for</strong> the sake of peer-review publicati<strong>on</strong>, compared, and disseminatedthrough public databases <strong>for</strong> the benefit of the wider scientific community.THE NEED FOR OMICS DATA STANDARDSThe topic of OMICS data standards has come to the <strong>for</strong>e, especially because the science of OMICS is distinguishedby a c<strong>on</strong>cern with the <strong>for</strong>midable task of characterizing not just <strong>on</strong>e, a handful, or some of a particulartype of molecule but all molecules within a biological sample. Molecular data sets of this scale areunprecedented in the study of biological systems. This sheer volume of data is further compounded by the1 Molecular Evoluti<strong>on</strong> & Bioin<strong>for</strong>matics Secti<strong>on</strong>, Ox<strong>for</strong>d <strong>Centre</strong> <strong>for</strong> <strong>Ecology</strong> and Hydrology, Ox<strong>for</strong>d, United Kingdom.2 EMBL, EBI (European Bioin<strong>for</strong>matics Institute), Wellcome Trust Genome Campus, Hinxt<strong>on</strong>, Cambridge, UnitedKingdom.84


FIELD AND SANSONETABLE 1.AN EVER INCREASING NUMBER OF STANDARDIZATION ACTIVITIESCommunity ef<strong>for</strong>t and website Standardizati<strong>on</strong> activities Citati<strong>on</strong> in this issueGenomic researchThe Genomic <strong>Standards</strong> C<strong>on</strong>sortium (GSC): C<strong>on</strong>tent (Minimal (Field et al., 2006;www.genomics.ceh.ac.uk/genomecatalogue/ In<strong>for</strong>mati<strong>on</strong> about a Morris<strong>on</strong> et al.,Genome Sequence: MIGS), 2006a)syntax, and semanticsInternati<strong>on</strong>al Nucleotide Sequence <strong>Data</strong>base C<strong>on</strong>tent (INSDC (Cochrane et al.,Collaborati<strong>on</strong> (INSDC): www.insdc.org Third-Party Annotati<strong>on</strong> 2006)Submissi<strong>on</strong> Guidelines)Genome Reviews: C<strong>on</strong>tent (review of (Sterk et al., 2006)www.ebi.ac.uk/GenomeReviews/standardizati<strong>on</strong> within theGenome Reviewsdatabase) and syntaxGSC: Chair of Organelles Working Group: C<strong>on</strong>tent (call <strong>for</strong> (Boore, 2006)www.genomics.ceh.ac.uk/genomecataloguestandardizati<strong>on</strong> ofdescripti<strong>on</strong>s of organelles)Post-genomic standardizati<strong>on</strong>MGED Society: C<strong>on</strong>tent (Minimal (Ball and Brazma,www.mged.org In<strong>for</strong>mati<strong>on</strong> about a 2006)Microarray Experiment:MIAME), syntax, andsemanticsHUPO–Proteomics <strong>Standards</strong> Initiative (PSI) C<strong>on</strong>tent (Minimal (Taylor et al., 2006)http://psidev.source<strong>for</strong>ge.net86In<strong>for</strong>mati<strong>on</strong> about aProteomics Experimentguidelines: MIAPE),syntax, and semanticsExperimental <strong>Standards</strong> <strong>for</strong> Proteomics C<strong>on</strong>tent (call <strong>for</strong> (Hogan et al., 2006)development of standardexperimental mixtures ofproteins)Metabolomics Society–MSI (Metabolomics C<strong>on</strong>tent, syntax, and (Fiehn et al., 2006)<strong>Standards</strong> Initiative):semanticswww.metabolomicssociety.orgIntegrati<strong>on</strong> activitiesReporting Structures <strong>for</strong> Biological Investigati<strong>on</strong>s C<strong>on</strong>tributi<strong>on</strong>s to c<strong>on</strong>tent (Sans<strong>on</strong>e et al.,Working Group (RSBI): and semantics 2006)www.mged.org/Workgroups/rsbi/rsbi.html“Env” Community led by the Envir<strong>on</strong>mental C<strong>on</strong>tent (MIAME/Env (Morris<strong>on</strong> et al.,Genomics Working Group (EGWG): checklist), syntax, and 2006b)http://envgen.nox.ac.uk/miame/miame_env.html semanticsThe Functi<strong>on</strong>al Genomics Experiment Object Syntax (J<strong>on</strong>es et al., 2006)Model (FuGE): http://fuge.source<strong>for</strong>ge.net/Nati<strong>on</strong>al Center <strong>for</strong> BioMedical Ontology (cBIO): Semantics (Rubin et al., 2006)www.bio<strong>on</strong>tology.org/Functi<strong>on</strong>al Genomics Investigati<strong>on</strong> Ontology Semantics (Whetzel et al.,(FuGO): 2006)http://fugo.source<strong>for</strong>ge.net/Other initiativesMISFISHIE Working Group: C<strong>on</strong>tent (MI Specificati<strong>on</strong> (Deutsch et al.,http://mged.source<strong>for</strong>ge.net/misfishie/ <strong>for</strong> In Situ Hybridiati<strong>on</strong> 2006)and ImmunohistochemistryExperiments: MISFISHIE),syntax, and semantics(C<strong>on</strong>tinued)


A SPECIAL ISSUE ON DATA STANDARDSTABLE 1.AN EVER INCREASING NUMBER OF STANDARDIZATION ACTIVITIES (CONT’D)Community ef<strong>for</strong>t and website Standardizati<strong>on</strong> activities Citati<strong>on</strong> in this issueFlow Cytometry experiment community: C<strong>on</strong>tent (MI about a Flow (Spidlen et al.,www.flowcyt.org Cytometry Experiment: 2006)MI-FACE), syntax, andsemanticsPhylogenetics Community C<strong>on</strong>tent (MI About a (Leebens-Mack, et al.,Phylogenetic Analysis: 2006)MIAPA)Generati<strong>on</strong> Challenges Project (GCP): Syntax and semantics (Bruskiewich et al.,www.generati<strong>on</strong>cp.org/bioin<strong>for</strong>matics.php 2006)Tax<strong>on</strong>omic Names/C<strong>on</strong>cepts sub-group of the C<strong>on</strong>tent and syntax (Kennedy et al.,Internati<strong>on</strong>al Working Group <strong>on</strong> Tax<strong>on</strong>omic (Tax<strong>on</strong>omic C<strong>on</strong>cept 2006)<strong>Data</strong>bases:Schema)http://tdwg.napier.ac.uk/TCS_1.01/v101.xsdRaman Analysis (RA) C<strong>on</strong>tent (Discussi<strong>on</strong> of the (Huang and Spiers,applicati<strong>on</strong> of RA in 2006)microbiology)This table summarizes the community-based c<strong>on</strong>tributi<strong>on</strong>s in this issue. For each, we list the <strong>for</strong>mal (or in<strong>for</strong>mal) nameof the community, the area of standardizati<strong>on</strong> covered (i.e., c<strong>on</strong>tent, syntax and semantics), and its citati<strong>on</strong> in this issue.When a community has defined an “MI” checklist, its full name is given. Readers can obtain full in<strong>for</strong>mati<strong>on</strong> about eachinitiative and its output from the listed websites.source, providing guidelines <strong>for</strong> researchers reporting their experiments, as well as journal editors, fundingbodies, scientific organizati<strong>on</strong>s, technology vendors, and regulatory bodies.MIAME, however, as all MI checklists, is a data c<strong>on</strong>tent standard, not a <strong>for</strong>mat standard. It is not enoughto specify that certain minimum in<strong>for</strong>mati<strong>on</strong> should be provided. It is essential that a standard transmissi<strong>on</strong><strong>for</strong>mat exists <strong>for</strong> the data. The next step is the c<strong>on</strong>ceptualizati<strong>on</strong> and implementati<strong>on</strong> of the minimal requirements,<strong>for</strong> example through the creati<strong>on</strong> of an Object Model (OM), which is then translated into a comunicati<strong>on</strong><strong>for</strong>mat that facilitates the exchange of data.While the data <strong>for</strong>mat provides the “syntax” of how to report the data, the “semantics,” or meanings, ofthe resulting annotati<strong>on</strong>s must also receive attenti<strong>on</strong>. This is best d<strong>on</strong>e as sets of c<strong>on</strong>trolled vocabularies or<strong>on</strong>tologies. A c<strong>on</strong>trolled vocabulary is a way to insert an interpretive layer of semantics between terms usedby different experimentalists to describe a sample treatment or an instrument’s parameter, to better representthe original intenti<strong>on</strong> of the terms use. An <strong>on</strong>tology is an explicit <strong>for</strong>mal representati<strong>on</strong> of the knowledgein a subject area, which includes c<strong>on</strong>trolled vocabularies <strong>for</strong> referring to the c<strong>on</strong>cepts and logical statementsthat describe what the c<strong>on</strong>cepts are and how they can or cannot be related to each other. Theincorporati<strong>on</strong> of <strong>on</strong>tologies into the annotati<strong>on</strong>s of data and metadata can provide semantics <strong>for</strong> featuresrelevant to the interpretati<strong>on</strong>, analysis, and integrati<strong>on</strong> of the experiment.<strong>Standards</strong> <strong>for</strong> data c<strong>on</strong>tent (“MI” checklists), syntax (file <strong>for</strong>mats), and semantics (<strong>on</strong>tology) make in<strong>for</strong>mati<strong>on</strong>more accessible (facilitating comparis<strong>on</strong>, repositi<strong>on</strong> and exchange of data) and enable the extracti<strong>on</strong>of maximum value from data sets (enabling an assessment of the quality and relevance of a piece of work)(Quackenbush, 2004). The reuse of the same semantics and syntax benefits the entire scientific communityby simplifying the job of data integrati<strong>on</strong>, but also eases the task of software developers, vendors, and equipmentmanufacturers by reducing time and costs <strong>for</strong> implementing standards-compliant products.THE PRACTICE OF GAINING ACCEPTANCETo be of use, standards must gain widespread or absolute acceptance within a community. This meansthey must be relevant and widely used. Managing this process of c<strong>on</strong>sensus-building from start to finishtakes time, resources, and expertise. To be successful, initiatives must overcome not <strong>on</strong>ly technological bar-87


FIELD AND SANSONEReporting Structure the Biological Investigati<strong>on</strong>s (RSBI)FIG. 1. General proposed structure to use existing checklists in a modular way]. The Emergence of multiple MinimalIn<strong>for</strong>mati<strong>on</strong> (MI) checklists. This figure illustrates the MI checklists in this issue. The MI trend is likely to progressand more checklists are yet to come. It is essential, however, that these are not created in isolati<strong>on</strong>. Their <strong>for</strong>mulati<strong>on</strong>should attempt to anticipate the needs of functi<strong>on</strong>al genomics and systems biology and their design should allow thedifferent checklists to functi<strong>on</strong> together as interchangeable modules. The RSBI framework proposes a way to facilitatethis coordinati<strong>on</strong> by restructuring the <strong>for</strong>mulati<strong>on</strong> of these checklists (Sans<strong>on</strong>e et al., this issue). Readers should notethat this is not an exhaustive list of activities and communities working to define “MI” checklist are c<strong>on</strong>tinually emerging(e.g. see the checklists <strong>for</strong> the “Minimal In<strong>for</strong>mati<strong>on</strong> about an RNAi experiment” [MIARE] at www.rnaiglobal.org/gti.html and the “Minimal In<strong>for</strong>mati<strong>on</strong> about a Cellular Assay” [MIACA] at http://miaca.source<strong>for</strong>ge.net/).riers, but also any sociological barriers that may stand in the way of building a c<strong>on</strong>sensus view of the in<strong>for</strong>mati<strong>on</strong>essential <strong>for</strong> describing a particular type of data, as well as resistance of data generators to providethis in<strong>for</strong>mati<strong>on</strong>. Such issues should never be underestimated, and underscore the need to attain significantbuy-in, an important theme of the c<strong>on</strong>tributi<strong>on</strong>s in this issue.Gaining community buy-in can be a l<strong>on</strong>g process, and the development of data standards is an iterativeprocess. Ideally, the creati<strong>on</strong> and implementati<strong>on</strong> of data standards is a highly interdisciplinary activity. Thekey stakeholders who must be included from the <strong>on</strong>set are the end users, and when standardizati<strong>on</strong> is technology-driven,as in the case of post-genomic technologies such as transcriptomics, this includes vendors.The end-users group is primarily comprised of researchers but also includes members of regulatory bodiesand, when data are submitted as part of a publicati<strong>on</strong>, journal editors. Because the practical aspects of standardsdevelopment are highly technical, practiti<strong>on</strong>ers of OMICS standards development projects are comm<strong>on</strong>lymembers of computer science departments, am<strong>on</strong>g which software engineers abound. Successful projectsthere<strong>for</strong>e also need to make sure they strike the right balance between leveraging the expertise ofdevelopers with the expert knowledge of researchers generating and analyzing the resulting data.REAL-WORLD EXAMPLES: THE CONTENTS OF THIS SPECIAL ISSUEThe entire biological community is now benefiting from the successes of standardizati<strong>on</strong> activities, whichare calling <strong>for</strong> new levels of attenti<strong>on</strong> to be paid to proper data management. The c<strong>on</strong>tents of this issue are88


A SPECIAL ISSUE ON DATA STANDARDSorganized into a call <strong>for</strong> acti<strong>on</strong> and c<strong>on</strong>tributi<strong>on</strong>s, which fall into the general categories of genomics, postgenomics,integrative activities, and other initiatives. The technologies used in OMICS are at different levelsof maturity, and likewise, the reporting practices <strong>for</strong> publishing and disseminating data generated usingthese methods are at different stages of development. Some, like the MIAME specificati<strong>on</strong> developed bythe Microarray Gene Expressi<strong>on</strong> <strong>Data</strong> (MGED) Society are in wide use (Ball and Brazma, this issue), andothers like the issue of standardizing the use of Raman analysis (RA) in envir<strong>on</strong>mental microbiology arebeing discussed <strong>for</strong> the first time (Huang and Spiers, this issue). All of these invited pieces provide detailsthat are not already found <strong>on</strong> project websites or within core documents, and are written to encourage transparency,participati<strong>on</strong>, and the adopti<strong>on</strong> of the resulting standards.It should be noted that this issue is not an exhaustive resource of standardizati<strong>on</strong> activities (Fig. 1). Forexample, as high-throughput technologies are widely used in industry and are being c<strong>on</strong>sidered by regulatoryagencies to develop policy, a range of methodologies has come under intense scrutiny. Agreement <strong>on</strong>standardizati<strong>on</strong> of data will do little good if experimental protocols prove inc<strong>on</strong>sistent. There is growingappreciati<strong>on</strong> of the importance of c<strong>on</strong>trolling <strong>for</strong> unwanted sources of variati<strong>on</strong> in complex OMICS studies,and the producti<strong>on</strong> and analysis of standard materials is now the focus of many initiatives, some ofwhich have been reviewed elsewhere (Sans<strong>on</strong>e et al., 2004). This issue <strong>on</strong>ly c<strong>on</strong>tains <strong>on</strong>e c<strong>on</strong>tributi<strong>on</strong> <strong>on</strong>the need <strong>for</strong> complex experimental standards, namely <strong>for</strong> proteomics (Hogan et al., this issue) to complementthe <strong>on</strong>going ef<strong>for</strong>t in the transcriptomics domain (Baker et al., 2005; ERCC, 2005) and elsewhere.A COMMUNITY’S CALL FOR ACTIONAs a further introducti<strong>on</strong> to this issue, we have invited Cath Brooksbank and John Quackenbush to <strong>for</strong>malizea call to acti<strong>on</strong> <strong>for</strong> further recogniti<strong>on</strong> of the importance of global OMICS standardizati<strong>on</strong> activities—past,present, and future. In their commentary, these authors eloquently describe the history of activitiesin this area and point out that Herculean ef<strong>for</strong>ts are often accomplished “<strong>on</strong> the side” and without<strong>for</strong>mal funding, simply because the lack of standardizati<strong>on</strong> is an unacceptable state of affairs <strong>for</strong> OMICSresearchers and is proving repeatedly to be a significant bottleneck in the collecti<strong>on</strong>, sharing, and integrati<strong>on</strong>of data (Brooksbank and Quackenbush, this issue). As they put it, a “quiet revoluti<strong>on</strong>” has been brewingsince the early 1990s, which is now full-blown and deserves resources that match its c<strong>on</strong>tributi<strong>on</strong>s. Theynote the successes of a variety of communities, most notably the Gene Ontology (GO) C<strong>on</strong>sortium (Ashburneret al., 2000) and the MGED Society (Ball and Brazma, this issue), and point out the promise of futurestandardizati<strong>on</strong> ef<strong>for</strong>ts to justify an open call to the community <strong>for</strong> further support and acceptance ofthese activities. These authors specifically call <strong>for</strong> journals to uphold the highest expectati<strong>on</strong>s <strong>for</strong> the reportingof data in the primary literature and <strong>for</strong> funding bodies to provide critical resources <strong>for</strong> these projects.Only through a combinati<strong>on</strong> of grass-roots development ef<strong>for</strong>ts, community adopti<strong>on</strong>, and <strong>for</strong>mal supportfrom journals and funding bodies will it be possible to achieve the highest standards of reporting andcompliance in the shortest time span <strong>for</strong> the benefit of the entire scientific community.GENOMIC RESEARCHThe five invited papers in this category describe the acquisiti<strong>on</strong> and handling of in<strong>for</strong>mati<strong>on</strong> describing completegenome sequences. The genomic revoluti<strong>on</strong> started in earnest with the sequencing of the first bacterialgenome (Fleischmann et al., 1995). We now have over 300 publicly available genome sequences from bacteriain our complete genome collecti<strong>on</strong> as well as a growing number of larger genomes, including the human genome.This wealth of in<strong>for</strong>mati<strong>on</strong> has drawn attenti<strong>on</strong> to the benefits of describing this collecti<strong>on</strong> in more detail <strong>for</strong> thesake of comparative analyses and the difficulties of collecting such metadata (Martiny and Field, 2006).The first c<strong>on</strong>tributi<strong>on</strong> in this category is a meeting report from the Genomic <strong>Standards</strong> C<strong>on</strong>sortium (GSC)describing their sec<strong>on</strong>d exploratory workshop at the European Bioin<strong>for</strong>matics Institute (EBI) (Field et al.,this issue). This internati<strong>on</strong>al group is organizing itself to work towards developing the Minimal In<strong>for</strong>mati<strong>on</strong>about a Genome Sequence (MIGS) specificati<strong>on</strong>. Participants at this workshop included members of89


FIELD AND SANSONEthe Internati<strong>on</strong>al Nucleotide Sequence <strong>Data</strong>base Collaborati<strong>on</strong> (INSDC) and the Genome Reviews database,who have c<strong>on</strong>tributed pieces describing evidence standards in experimental and inferential INSDC ThirdParty Annotati<strong>on</strong> data (Cochrane et al., this issue) and the standardizati<strong>on</strong> of comparative genomic data andin<strong>for</strong>mati<strong>on</strong> within the Genome Reviews database (Sterk et al., this issue). Jeff Boore, chair of the organellesworking group within the GSC, also elaborates <strong>on</strong> issues surrounding the standardizati<strong>on</strong> of genomic databasesto accurately describe and allow data mining of organellar genomes (Boore, this issue). Finally, membersof the GSC also elaborate <strong>on</strong> the c<strong>on</strong>cept of sample in a piece that explores how the GSC can bestwork towards its goals within the existing framework of the INSDC and the wider functi<strong>on</strong>al genomic community(Morris<strong>on</strong> et al., this issue). This is an obvious example where interest in describing <strong>on</strong>e aspect ofan OMICS experiment (sample) can have an impact across a range of standardizati<strong>on</strong> activities and underscoresthe need <strong>for</strong> tight linkages between initiatives (Fig. 1).POST-GENOMIC STANDARDIZATIONThe post-genomic technologies of transcriptomics, proteomics, and metabolomics have emerged from thesuccesses of the initial genomic revoluti<strong>on</strong> to provide primary OMICS methods of characterizing sets of moleculesproduced by genomes. This issue c<strong>on</strong>tains updates from the authorities devising standards <strong>for</strong> all threeof these “core” OMICS technologies. Catherine Ball and Alvis Brazma provide a historical overview of emergenceand activities of the MGED Society and describe how the standardizati<strong>on</strong> activities arose from theneeds of the research community and how they have evolved over time (Ball and Brazma, this issue). ChrisTaylor leads a paper <strong>on</strong> the development of proteomics standards by members of the Proteomics <strong>Standards</strong>Initiative (PSI) within the HUman Proteome Organisati<strong>on</strong> (HUPO) (Taylor et al., this issue). As already menti<strong>on</strong>ed,this issue also includes <strong>on</strong>e example of a call <strong>for</strong> standardizati<strong>on</strong> of experimental standards, in thiscase <strong>for</strong> the development of reference datasets representing complex mixtures of proteins <strong>for</strong> proteomics research(Hogan et al., this issue). Finally, metabolomic and metab<strong>on</strong>omic data is also growing in importancewithin the OMICS c<strong>on</strong>text, as such technologies allow the metabolites of cells to be characterized and studiedin resp<strong>on</strong>se to an unlimited number of parameters. In this issue, leaders of the Metabolomics <strong>Standards</strong>Initiative (MSI), working under the Metabolomics Society umbrella, make an open call <strong>for</strong> participati<strong>on</strong> inthe newly established ef<strong>for</strong>ts to standardize the reporting of this type of data (Fiehn et al., this issue).INTEGRATION ACTIVITIESIr<strong>on</strong>ically, the emergence of multiple standards has made the need <strong>for</strong> harm<strong>on</strong>izati<strong>on</strong> between groups developingstandards a critical issue—namely to avoid duplicati<strong>on</strong> of ef<strong>for</strong>t between standardizati<strong>on</strong> activitiesand accelerate the way in which new standards are built (Sans<strong>on</strong>e et al., 2004). For example, the c<strong>on</strong>ceptof “sample,” as described in this issue (Morris<strong>on</strong> et al., this issue), is <strong>on</strong>e of many cross-cutting c<strong>on</strong>ceptsthat should be shared am<strong>on</strong>g standards. This issue c<strong>on</strong>tains descripti<strong>on</strong>s of five projects providing integrativeroles. The first is the Reporting Structures <strong>for</strong> Biological Investigati<strong>on</strong>s (RSBI), a group working underthe umbrella of MGED, but also working towards greater harm<strong>on</strong>y between a wide range of standardizati<strong>on</strong>activities in different domains of applicati<strong>on</strong>s namely nutrigenomics, ecotoxicogenomics andenvir<strong>on</strong>mental genomics (Sans<strong>on</strong>e et al., this issue). In this issue, the “Env” community, which c<strong>on</strong>tributesthe envir<strong>on</strong>mental genomic comp<strong>on</strong>ent of the RSBI working group, makes its first open call <strong>for</strong> participati<strong>on</strong>(Morris<strong>on</strong> et al., this issue). The Env community, organized around the Envir<strong>on</strong>mental Genomics WorkingGroup (EGWG), is interested in the annotati<strong>on</strong> and exchange of data from envir<strong>on</strong>mental OMICS studies.Such metadata includes details <strong>on</strong> the locati<strong>on</strong>, phenotype, and envir<strong>on</strong>mental c<strong>on</strong>diti<strong>on</strong>s of the biologicalsample used and has been <strong>for</strong>malized as a community-level extensi<strong>on</strong> of MIAME, called MIAME/Env(Morris<strong>on</strong> et al., this issue).As the number of checklists from different initiatives grow, it will be essential to m<strong>on</strong>itor the overlap ofthe in<strong>for</strong>mati<strong>on</strong> to be collected and to capitalize <strong>on</strong> synergies that make the generati<strong>on</strong> of new checklistsand the future integrati<strong>on</strong> of data simpler. The RSBI working group is there<strong>for</strong>e an important source of case90


FIELD AND SANSONEings are available and helping projects to build momentum and produce outputs in a timely fashi<strong>on</strong>. Thepractical keys to success are often stable releases of standards—stable releases are required to allow workingimplementati<strong>on</strong>s to be produced. Other themes include the importance of use cases, taking the time toproperly define scope, and attenti<strong>on</strong> to the significant technological details each project must surmount.What we have witnessed over the past decade is the rise of a family of high throughput technologies.While each method holds great promise, it represents but <strong>on</strong>e way to investigate biological systems andtrue knowledge is far more likely to come from the interpretati<strong>on</strong> of data w<strong>on</strong> through the applicati<strong>on</strong> ofmany technologies. It is clear that data standards will have a special role to play in realizing this goal, andthe broader OMICS community has shown that it is ready to rise to the occasi<strong>on</strong>. We hope that from theindividual pockets of the community leading data standardizati<strong>on</strong> activities will emerge an inclusive, joinedupOMICS standardizati<strong>on</strong> community. Already, it is increasingly comm<strong>on</strong> <strong>for</strong> key individuals to be membersof two or more standardizati<strong>on</strong> ef<strong>for</strong>ts and this trend is increasing. The development of a high-profile,vibrant, interdisciplinary community will have the collective expertise and c<strong>on</strong>necti<strong>on</strong>s to help push <strong>for</strong>wardthe l<strong>on</strong>g-term visi<strong>on</strong> of OMICS data integrati<strong>on</strong> and interpretati<strong>on</strong>.ACKNOWLEDGMENTSIt has been our pleasure to put together this special issue. We would like to thank Eugene Kolker, Editor-in-Chief,<strong>for</strong> inviting us to do so and thereby providing these communities with a unique <strong>for</strong>um in whichto report <strong>on</strong>going activities. This invitati<strong>on</strong> came about as a result of Eugene’s attendance at the GSC workshop,which is included in this issue (Field et al., 2006, this issue), and we would like to thank NERC <strong>for</strong>funding. We would also like to thank all of the c<strong>on</strong>tributing authors.REFERENCESASHBURNER, M., BALL, C.A., BLAKE, J.A., et al. (2000). Gene <strong>on</strong>tology: tool <strong>for</strong> the unificati<strong>on</strong> of biology. TheGene Ontology C<strong>on</strong>sortium. Nat Genet 25, 25–29.ASHBURNER, M., BALL, C.A., BLAKE, J.A., et al. (2006). Gene <strong>on</strong>tology: tool <strong>for</strong> the unificati<strong>on</strong> of biology. TheGene Ontology C<strong>on</strong>sortium <strong>Data</strong>base resources of the Nati<strong>on</strong>al Center <strong>for</strong> Biotechnology In<strong>for</strong>mati<strong>on</strong>. Nucleic AcidsRes 34, D173–D180.BAKER, S.C., BAUER, S.R., BEYER, R.P., et al. (2005). The External RNA C<strong>on</strong>trols C<strong>on</strong>sortium: a progress report.Nat Methods 2, 731–734.BALL, C.A., and BRAZMA, A. (2006). MGED standards: work in progress. OMICS (this issue).BOORE, J.L. (2006). Requirements and standards <strong>for</strong> organelle genome databases. OMICS (this issue).BRAZMA, A., HINGAMP, P., QUACKENBUSH, J., et al. (2001). Minimum in<strong>for</strong>mati<strong>on</strong> about a microarray experiment(MIAME)—toward standards <strong>for</strong> microarray data. Nat Genet 29, 365–371.BROOKSBANK, C., and QUACKENBUSH, J. (2006). <strong>Data</strong> standards: a call to acti<strong>on</strong>. OMICS (this issue).BRUSKIEWICH, R., DAVENPORT, G., HAZEKAMP, T., et al. (2006). The Generati<strong>on</strong> Challenge Programme (GCP):standards <strong>for</strong> crop data. OMICS (this issue).COCHRANE, G., BATES, K., APWEILER, R., et al. (2006). Evidence standards in experimental and inferential INSDCThird Party Annotati<strong>on</strong> data. OMICS (this issue).DEUTSCH, E.W., BALL, C.A., BOYA, S., et al. (2006). Minimum in<strong>for</strong>mati<strong>on</strong> specificati<strong>on</strong> <strong>for</strong> in situ hybridizati<strong>on</strong>and immunohistochemistry experiments (MISFISHIE). OMICS (this issue).EILBECK, K., LEWIS, S.E., MUNGALL, C.J., et al. (2005). The Sequence Ontology: a tool <strong>for</strong> the unificati<strong>on</strong> ofgenome annotati<strong>on</strong>s. Genome Biol 6, R44.ERCC. (2005). Proposed methods <strong>for</strong> testing and selecting the ERCC external RNA c<strong>on</strong>trols. BMC Genomics 6, 150.FIEHN, O., KRISTAL, B., OMMEN, B.V., et al. (2006). Establishing reporting standards <strong>for</strong> metabolomic and metab<strong>on</strong>omicstudies: a call <strong>for</strong> participati<strong>on</strong>. OMICS (this issue).FIELD, D., MORRISON, N., SELENGUT, J., et al. (2006). eGenomics: cataloging our complete genome collecti<strong>on</strong> II.OMICS (this issue).FLEISCHMANN, R.D., ADAMS, M.D., WHITE, O., et al. (1995). Whole-genome random sequencing and assemblyof Haemophilus influenzae Rd. Science 269, 496–512.92


A SPECIAL ISSUE ON DATA STANDARDSGO (Gene Ontology) CONSORTIUM. (2006). The Gene Ontology (GO) project in 2006. Nucleic Acids Res 34,D322–D326.HOGAN, J.M., HIGDON, R., and KOLKER, E. (2006). Experimental standards <strong>for</strong> high-throughput proteomics. OMICS(this issue).HUANG, W.E., and SPIERS, A.J. (2006). C<strong>on</strong>siderati<strong>on</strong> of future requirements <strong>for</strong> Raman microbiology as an examplar<strong>for</strong> the ab initio development of in<strong>for</strong>matics frameworks <strong>for</strong> emergent OMICS technologies. OMICS (this issue).JONES, A.R., PIZARRO, A., SPELLMAN, P., et al. (2006). FuGE: Functi<strong>on</strong>al Genomics Experiment object model.OMICS (this issue).KENNEDY, J., HYAM, R., KUKLA, R., et al. (2006). Standard data model representati<strong>on</strong> <strong>for</strong> tax<strong>on</strong>omic in<strong>for</strong>mati<strong>on</strong>.OMICS (this issue).LEEBENS-MACK, J., VISION, T., BRENNER, E., et al. (2006). Taking the first steps towards a standard <strong>for</strong> reporting<strong>on</strong> phylogenies: minimal in<strong>for</strong>mati<strong>on</strong> about a phylogenetic analysis (MIAPA). OMICS (this issue).MARTINY, J.B.H., and FIELD, D. (2006). Ecological perspectives <strong>on</strong> our complete genome collecti<strong>on</strong>. Ecol Lett 8,1334–1345.MORRISON, N., COCHRAN, G., FARUQUE, N., et al. (2006a). C<strong>on</strong>cept of sample in Omics technology. OMICS(this issue).MORRISON, N., WOOD, A.J., HANCOCK, D., et al. (2006b). Annotati<strong>on</strong> of envir<strong>on</strong>mental OMICS data: applicati<strong>on</strong>to the transcriptomics domain. OMICS (this issue).NAS. (2003). NAS Committee <strong>on</strong> Resp<strong>on</strong>sibilities of Authorship in the Biological Sciences. Sharing publicati<strong>on</strong>-relateddata and materials. Available at: www.nap.edu/books/0309088593/html/.NIH. (2006). NIH roadmap <strong>for</strong> bioin<strong>for</strong>matics and computati<strong>on</strong>al biology. Available at: http://nihroadmap.nih.gov/bioin<strong>for</strong>matics/index.asp.OECD. (2003). OECD Group <strong>on</strong> issues of access to publicly funded research data. Promoting access to public researchdata <strong>for</strong> scientific, ec<strong>on</strong>omic, and social development. Available at: http://dataaccess.ucsd.edu/Final_Report_2003.pdf.QUACKENBUSH, J. (2004). <strong>Data</strong> standards <strong>for</strong> “omic” science. Nat Biotechnol 22, 613–614.RUBIN, D., LEWIS, S.E., MUNGALI, C.J., et al. (2006). Nati<strong>on</strong>al Center <strong>for</strong> Biomedical Ontology (cBiO): Advancingbiomedicine through structured organizati<strong>on</strong> of scientific knowledge OMICS: (this issue).SANSONE, S., MORRISON, N., ROCCA-SERRA, P., et al. (2004). Standardizati<strong>on</strong> initiatives in the (eco)toxicogenomicsdomain: a review. Comp Funct Genom 5, 633–641.SANSONE, S.-A., ROCCA-SERRA, P., TONG, W., et al. (2006). A strategy capitalizing <strong>on</strong> synergies: The ReportingStructure <strong>for</strong> Biological Investigati<strong>on</strong> (RSBI) working group. OMICS (this issue).SPIDLEN, J., GENTLEMAN, R.C., HAALAND, P.D., et al. (2006). <strong>Data</strong> standards <strong>for</strong> flow cytometry. OMICS (thisissue).STERK, P., KERSEY, P.J., and APWEILER, R. (2006). Genome reviews: standardizing c<strong>on</strong>tent and representati<strong>on</strong> ofin<strong>for</strong>mati<strong>on</strong> about complete genomes. OMICS (this issue).TAYLOR, C.E., HERMJAKOB, H., JULIAN, JR., R.K., et al. (2006). The Work of the Human Proteome Organisati<strong>on</strong>’sProteomics <strong>Standards</strong> Initiative (HUPO PSI). OMICS (this issue).TIWARI, B., FIELD, D., and SNAPE, J. (2006). Public data repositories need serious funding. Nature 439, 912.WELLCOME. (2003). Wellcome Trust. Sharing data from large-scale biological research projects: a system of tripartiteresp<strong>on</strong>sibility. Available at: www.genome.gov/Pages/Research/WellcomeReport0303.pdf.WHETZEL, P.L., RYAN, R.R., BRINKMAN, H.C., et al. (2006). Development of FuGO: an <strong>on</strong>tology <strong>for</strong> functi<strong>on</strong>algenomics investigati<strong>on</strong>s. OMICS (this issue).Address reprint requests to:Dr. Dawn FieldMolecular Evoluti<strong>on</strong> & Bioin<strong>for</strong>maticsMansfield RoadOx<strong>for</strong>d <strong>Centre</strong> <strong>for</strong> <strong>Ecology</strong> and HydrologyOx<strong>for</strong>d, OX1 3SR, UKE-mail: dfield@ceh.ac.uk93

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!