JournÃ©e annuelle plateforme Bio-Informatique Jeudi 31 Mars 2011 ...

Journée annuelle plateforme Bio-Informatique 

Jeudi 31 Mars 2011 

Programme et Résumés 

9h00- 9h20 Accueil 

9h20- 9h30 Présentation de la journée (Christine Gaspin) 

09h30 – 12h05 Session « Réseaux et Interactions » (chairman : Yves Quentin) 

09h30 

10h30 

10h55 

A mathematical and algorithmic exploration of the molecular landscape and evolution of 

symbiosis 

Marie France Sagot (INRIA Grenoble Rhône-Alpes and LBBE, Univ. Claude Bernard, Lyon 1) 

TrypanoCyc : a community effort towards the development of a metabolic pathway 

database for T. Brucei 

Ludovic Cottret, Flora Logan-Klumpler, Florence Vinson, Frederic Bringaud, Michael Boshart, 

Peter Bütikofer, Matt Berriman, Mark Carrington, Harry De Koning, Michael Ferguson, Michael 

Ginger, Pascal Maeser, Paul Michels, Derek Nolan, Fred Opperdoes, Marc Ouellette, Margaret 

Phillips, David Roos, Terry Smith, Aloysius Tielens, Martin C. Taylor, Jaap Van Hellemond, 

Michael Barrett and Fabien Jourdan 

Differential expression analysis on affymetrix exon arrays using R Bioconductor, 

cytoscape and network biology tools 

Matthias Macé, Yannick Allanore et Maria Martinez (INSERM U1043, CHU Purpan, Toulouse et 

Service de Rhumatologie A & INSERM U1016, Hôpital Cochin, Paris) 

11h20 - 11h40 Pause 

11h40 

Gene regulatory network reconstruction using Bayesian Networks, the Dantzig selector 

and the Lasso: a meta- analysis 

David Allouche, Christine Cierco-Ayrolles, Simon de Givry, Brigitte Mangin, Nidal Ramadan, 

Thomas Schiex, Jimmy Vandel et Matthieu Vignes (BIA, Toulouse) 

12h05 – 12h55 Session « Génomique 1 » (chairman : Yves Quentin) 

12h05 

12h30 

A genome-wide association study of Parkinson's disease 

Mohamad Saad et Maria Martinez (INSERM U563 CHU Purpan, Toulouse) 

Priorisation de gènes candidats chez les procaryotes par fusion de données multigénomiques 

pour l'étude des transporteurs ABC 

Roland Barriot, Yves Quentin, Gwenaëlle Fichant (IBCG-LMGM, Université Paul Sabatier, 

Toulouse)

12h55-14h15 Repas déjeunatoire servi dans le hall Génome 

14h15 – 15h55 Session « Génomique 2 » (chairman : Fabien Jourdan) 

14h15 

14h40 

15h05 

15h30 

Annotation et analyse de la famille multigénique des peroxydases 

Catherine Mathé, Marie Brette Bruno Savelli et Christophe Dunand (UMR 5546 

CNRS/Université P. Sabatier, Toulouse) 

GeneHuggers 

Sébastien Briois & Jason Iacovoni (LBCMCP, CNRS UMR 5088, UPS, Toulouse, PF Bioinformatique 

I2MC, INSERM) 

Utilisation d'environnements Ensembl/BioMart/DAS pour l'expoitation de résultats issus 

des NGS 

Patrice Dehais (SIGENAE, LGC, INRA, Toulouse) 

RNAspace: a generator of web sites to support prediction, annotation and analysis of 

ncRNA 

Marie-Josée Cros, Antoine de Monte, Jérôme Mariette, Philippe Bardou, Daniel Gautheret, 

Hélène Touzet and Christine Gaspin 

15h55 - 16h15 Pause 

16h15 – 17h30 Session « Séquences et Haut débit » (chairwoman : Céline Noirot) 

16h15 

16h40 

17h05 

NG6 : Next Generation Sequence Information System 

Jérôme Mariette, Nicolas Allias, Céline Noirot, Gérald Salin, Sylvain Thomas, Christophe Klopp 

(PF GénoToul Bioinfo, INRA, Toulouse) 

A Comparative Study of Statistical Methods for Detecting Association with Rare Variants 

in Exome- Resequencing Data 

Nora Bohossian, Mohamad Saad, Aude Saint Pierre, Matthias Macé, Maria Martinez 1 

(INSERM U563 - Bât B, CHU Purpan, Toulouse) 

Chorégraphie des gènes dans le noyau de Levure : vers une analyse haut débit 

Olivier Gadal et Alain Kangoué (LBME, Université Paul Sabatier, Toulouse) 

17h30-17h45 Clôture de la journée (Nic. Lindley)

TrypanoCyc: a community effort towards the development of a metabolic pathway database for 

Trypanosoma brucei 

Ludovic Cottret 3 , Flora Logan-Klumpler 1,2 Florence Vinson 3 , Frederic Bringaud 4 , Michael Boshart 5 , Peter 

Bütikofer 6 , Matt Berriman 1 , Mark Carrington 2 , Harry De Koning 7 , Michael Ferguson 8 , Michael Ginger 9 , Pascal 

Maeser 10 , Paul Michels 11,12 , Derek Nolan 13 , Fred Opperdoes 12 , Marc Ouellette 14 , Margaret Phillips 15 , David 

Roos 16 , Terry Smith 17 , Aloysius Tielens 18 , Martin C. Taylor 19 , Jaap Van Hellemond 18 , Michael Barrett 7 and Fabien 

Jourdan 3 

1. The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. 

2. Department of Biochemistry, Tennis Court Road, Cambridge CB2 1QW, UK 

3. IUMR 1331 INRA/INP/UPS, TOXALIM (Research Centre in Food Toxicology), F-31000 Toulouse, France 

4. Centre de Résonance Magnétique des Systèmes Biologiques (RMSB), UMR 5536 CNRS, Université Victor Segalen Bordeaux 2, 

Bordeaux, France 

5. University of Munich (LMU), Department Biology I, Genetics, Großhaderner Str. 2-4, 82152 Martinsried, Germany 

6. Institute of Biochemistry & Molecular Medicine, University of Bern, Switzerland 

7. Faculty of Biomedical and Life Science and Wellcome Trust Centre for Molecular Parasitology, Glasgow Biomedical Research Centre, 

University of Glasgow, Glasgow, UK 

8. Division of Biological Chemistry and Molecular Microbiology, the School of Life Sciences, University of Dundee, Dundee DD1 5EH 

9. School of Health and Medicine, Division of Biomedical and Life Sciences, Lancaster University, Lancaster LA1 4YQ, UK 

10. Institute of Cell Biology, University of Bern, Switzerland. 

11. Research Unit for Tropical Diseases, de Duve Institute, TROP 74.39, Avenue Hippocrate 74, B-1200 Brussels, Belgium 

12. Laboratory of Biochemistry, Université catholique de Louvain, Brussels, Belgium 

13. School of Biochemistry and Immunology, Trinity College Dublin, Ireland 

14. Centre de Recherche en Infectiologie du CHUL, Université Laval, 2705 Boul, Laurier, Québec, Québec, G1V 4G2, Québec, Canada 

15. Departments of Pharmacology, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9041 

16. Department of Biology, University of Pennsylvania, Philadelphia, PA 19104 

17. Centre for Biomolecular Sciences, St Andrews University, St Andrews, KY16 9ST 

18. Department of Medical Microbiology and Infectious Diseases, Erasmus MC University Medical Center, Gravendijkwal 230, 3015 CE 

Rotterdam, The Netherlands 

19. Pathogen Molecular Biology Unit, Department of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, 

Keppel Street, London WC1E 7HT, UK. 

Mots clés: metabolic networks, Trypanosoma brucei, network annotation 

Linking biochemical data to the reference genome for Trypanosoma brucei, the aetiological agent of 

Human African Trypanosomiasis, is important for comparative genomic and metabolomic studies and for 

investigating T. brucei biology and the disease it causes. TrypanoCyc is the metabolic pathway database for T. 

brucei, and will be an invaluable resource for detailed analyses of the metabolic network in this organism, as 

well as cross species comparisons with other kinetoplastids. 

The TrypanoCyc database was initially built from the genome sequence of Trypanosoma brucei, using a 

collaborative web platform (TrypAnnot), and based on an annotation published by GeneDB at the Wellcome 

Trust Sanger Institute. The pathway-tools software that generated the initial automatic genome-based 

reconstruction indicates the presence of a pathway if just a few enzymes associated with the classical pathway 

are noted. Furthermore, trypanosome-specific pathways could only be included if deposited in the Metacyc 

repository. Therefore, post-construction manual curation is essential to generate an accurate depiction of the 

trypanosome’s metabolome. This has been a collaborative effort, and involved removing errors, correcting 

automated predictions, and adding information from the literature. Initially, pathways were annotated 

according to their presence, sub-cellular localisation and stage-specific expression. Once this was completed, a 

second round of annotating focused on individual enzymes. The ongoing curation will be based on public 

sources, literature searches, and results of experimental and bioinformatics studies. These metadata on the 

network will allow users to generate tailor-made metabolic networks (e.g. metabolic network taking place in 

procyclic’s mitochondria). 

All genes in TrypanoCyc are linked to the corresponding entry in GeneDB (Wellcome Trust Sanger 

Institute) and TriTrypDB (EuPathDB). 

The current stage of the annotation process is available on the internet at www.metexplore.fr/trypnets.

Differential expression analysis on affymetrix exon arrays using R Bioconductor, cytoscape 

and network biology tools 

Matthias Macé 1 , Yannick Allanore 2 et Maria Martinez 1 

1 INSERM U1043, CHU Purpan, Toulouse 

2 Service de Rhumatologie A & INSERM U1016, Hôpital Cochin, Paris 

Mots clés: transcriptomics, affymetrix exon array, R bioconductor, cytoscape, network biology 

Microarrays expression assays are able to deliver huge amount of information at a whole-genome 

genomic scale. This is a challenge for the analyst both from statistics and informatics points of view. They also 

provide insights into network biology for the study of pathological and physiological systems. A collection of 

opensource tools are available. Those tools are characterised by their flexibility and evolvability as compared to 

commercial packages. Here, we present an analysis pipeline that can take advantage of all the data available 

using Affymetrix exon arrays. 

The general question we aim to answer is whether differences exist between two different systems. 

Such differences can be assessed at the gene level as well as at the exon level. This information can then be 

combined into interaction networks in order to decipher biological pathways. 

Affymetrix exon arrays are characterized by a high probe density, multiple-targeting known and 

predicted exons. This allows a better accuracy in expression inference and focusing on splicing events. 

The different steps, performed in the R environment, include data normalization/quality control, 

differential expression at the gene level, “differential regulation” (exon level) and network biology/annotation 

analyses. The normalization is performed using RMA algorithm, and quality control (mainly outlier removal) by 

PCA and clustering. The subsequent analyses were performed using R biocondutor functions and custom scripts. 

For the gene-level summarization and comparisons, we used linear-modelling (limma package). Exon-level 

comparisons were performed by computing Splicing Index and MiDAS. Probes were mapped to genes and exons 

using exonmap and a local ENSEMBL install. The gene lists obtained were then used for network biology 

analyses. First, interactions (edges) between the genes (nodes) were collected in a public interaction database 

(STRING) aggregating data/litterature mining and predictions. These graphs are visualized in Cytoscape, also 

allowing a versatile analysis given its plug-in architecture. The graphs obtained are then compared between 

conditions (union/intersection) or decomposed into subnetworks (MCL clustering) before annotation 

enrichment (BINGO plug-in). 

We previously applied this pipeline to real dataset in human pathology on patient tissues (corneas from 

keratoconus patients) [1] . Here, we present the results of another study on progenitor cells extracted from 

Systemic Sclerosis patients and cultured under two different conditions. 

The perspectives opened by these analyses are to go further in deciphering interactions in complex 

systems. They can be used to draw hypothesis further validated by functional genomics (qPCR/proteome) 

possibly after e.g. transgenesis. 

Publications 

[1] Macé M., et al. Comparative transcriptome and network biology analyses demonstrate antiproliferative and 

hyperapoptotic phenotypes in human keratoconus corneas. IOVS, accepted for publication.

Gene regulatory network reconstruction using Bayesian Networks, the 

Dantzig selector and the Lasso: a meta-analysis 

David Allouche 1 , Christine Cierco-Ayrolles 1 , Simonde Givry 1 , Brigitte Mangin 1 , Nidal Ramadan 1 , 

Thomas Schiex 1 , Jimmy Vandel 1 et Matthieu Vignes 1 

1 BIA Unit, SaAB team, INRA Toulouse, chemin de Borderouge, 31326 Castanet Tolosan 

Cedex, France. Contact: {firstname.lastname}@toulouse.inra.fr. 

Keywords : gene regulatory network inference, reverse engineering, Bayesian Network, penalized regression, 

genomical genomics 

The goal was to reconstruct gene regulatory networks from genetic and genomic data simulated 

according to the DREAM5 Challenge 3A (Systems Genetics) protocol. Our team implemented several tools to do 

so: Bayesian Networks and multiple linear regressions. The former was solved by a specific extended Bayesian 

score whilst two penalization techniques (Dantzig and Lasso) were considered for the latter. 

These approaches were combined into a meta-analysis using a Fisher's Inverse Chi-Square meta-test. 

We present and comment here the results we obtained.

A genome-wide association study of Parkinson’s disease 

Mohamad Saad 1,2 , Maria Martinez 1,2 

1 

INSERM U563, CHU Purpan, Toulouse, France 

2 Université Paul Sabatier, Toulouse, France 

Keywords: Parkinson’s disease, Genome Wide Association Study 

Abstract 

We performed a three-stage genome-wide association study to identify common PD risk variants in the European 

population. The initial genome-wide scan was conducted in a French sample of 1,039 cases and 1,984 controls, 

using almost 500K SNPs (Illumina 610Quad chip). Two SNPs at SNCA were found associated with PD at the 

genome-wide significance level (P < 3 x 10 -8 ). An additional set of promising and new association signals was 

identified and submitted for immediate replication in two independent case control studies of subjects of European 

descent. We first carried-out an in-silico replication study using GWAS data from the WTCCC2 PD study sample 

(1,705 cases, 5,200 WTCCC controls). 

Nominally replicated SNPs were further genotyped in a third sample of 1,527 cases and 1,864 controls from 

France and Australia. We found converging evidence of association with PD on 12q24 

(rs4964469, combined P = 2.4x10 -7 ) and confirmed the association on 4p15/BST1 (rs4698412, combined P = 

1.8x10 -6 ), previously reported in Japanese data. The 12q24 locus includes RFX4, an 

isoform of which, named RFX4_v3, encodes a brain specific transcription factors that regulates many genes 

involved in brain morphogenesis and intracellular calcium homeostasis. 

References 

1. Saad, M., S. Lesage, et al. (2011). "Genome-wide association study confirms BST1 and suggests a locus on 

12q24 as the risk loci for Parkinson's disease in the European population." Hum Mol Genet 20(3): 615-627.

Priorisation de gènes candidats chez les procaryotes par fusion de données multi-génomiques 

pour l'étude des transporteurs ABC 

Roland Barriot, Yves Quentin, Gwenaëlle Fichant 1,2 

1. Centre National de la Recherche Scientifique; LMGM; F-31000 Toulouse; France 

2. Université de Toulouse; UPS; Laboratoire de Microbiologie et Génétique Moléculaires; F-31000 Toulouse ; France 

Mots clés: données hétérogènes, priorisation, inférence, orthologie, transporteurs ABC, génomique 

L'abondance actuelle des données omiques devrait permettre une meilleure identification des partenaires 

d'un système ou des différents acteurs dans un processus biologique. Conjointement, cette tâche d'identification 

devient plus complexe à mesure que les données sont hétérogènes (génome, transcriptome, interactome, ...) et 

massives. Il apparait donc crucial de développer des méthodes capable de tirer meilleur parti de toutes ces 

informations, en les intégrant et en les confrontant, afin de prioriser, de manière objective et exhaustive, les 

meilleurs candidats associés à un processus biologique. 

Nous présentons l'extension d'une méthode générique qui, en confrontant des données hétérogènes issues d'un 

ensemble d'organismes, permet d'améliorer la qualité des méthodes de priorisation de gènes candidats par 

fusion de données génomiques. Le principe général de cette approche est le suivant. La première étape consiste 

à sélectionner un ensemble de gènes d’intérêt, par exemple les gènes codant pour un transporteur ABC. La 

deuxième étape consiste à sélectionner les gènes candidats (par exemple le reste du génome). Ensuite, la 

proximité de chaque gène candidat par rapport aux gènes d’intérêt est évaluée afin de réordonner la liste de 

gènes candidats du plus similaire au moins similaire (priorisation). Cette étape est effectuée en parallèle sur 

chaque source de données disponibles (transcriptomes, interactomes, …) car elle fait intervenir des mesures de 

similarité spécifiques à chaque type de données. Par exemple, un coefficient de corrélation est utilisé pour des 

données d’expression alors que pour un réseau d'interactions, la distance dans le réseau est directement utilisée. 

A l’issue de cette étape, les gènes candidats se retrouvent triés, potentiellement dans des ordres différents, en 

plusieurs listes (une par source de données). La dernière étape consiste à fusionner ces différentes listes afin 

d’obtenir la priorisation globale des gènes candidats. Cette méthode a déjà été mise en œuvre [1], mais elle est 

actuellement limitée à l’exploitation de données disponibles pour un seul génome. Afin d'exploiter les données 

disponibles sur les autres génomes, nous avons étendu l'approche par l'utilisation des relations d'homologie 

(orthologie, paralogie) entre les gènes de génomes différents. De manière générale, cela permet de favoriser un 

candidat lorsque ses orthologues dans d’autres génomes sont également proches des orthologues des gènes 

d’intérêt. Par exemple sur les données d’expression, un candidat sera d’autant plus intéressant que ses 

orthologues sont co-exprimés avec les orthologues des gènes d’intérêt. L’utilisation de plusieurs sources de 

données sur un seul génome fournit un faisceau d’indices et produit une priorisation meilleure que celles 

obtenues sur chaque source de données prise séparément. L’ajout de sources de données provenant d’autres 

génomes vient accentuer ce phénomène, soit en renforçant la pertinence de certains candidats (par exemple : 

conservation de la co-expression au cours de l’évolution pour les données d’expression), soit en enrichissant les 

données (transfert d’annotation, par exemple lorsqu’un système a été étudié dans un autre organisme). 

Afin de valider la méthode, nous avons choisi de l’appliquer aux transporteurs ABC pour lesquels les 

composés transportés sont parfois connus dans certains organismes. Une base de données dédiée, ABCdb [2], 

est maintenue au sein notre équipe. Il arrive que lors de l'idenfitication et de la reconstruction des systèmes ABC 

par la stratégie développée au sein du laboratoire, certains partenaires soient manquants. Dans ce contexte, par 

priorisation de gènes, notre approche devrait permettre d’identifier le(s) partenaire(s) d’un système sur des 

critères plus génériques. Ensuite, la priorisation à partir de tous les partenaires devrait permettre d’identifier des 

gènes associés aux systèmes ABC pouvant traduire une relation fonctionnelle et ainsi proposer un processus 

biologique associé au système. Nous présentons des résultats sur les performances obtenues pour la 

reconstruction des transporteurs ABC ainsi que des résultats encourageant quant à la possibilité d'inférer un 

processus biologique associé ou un substrat spécifique. 

Publications 

[1] Tranchevent, L.-C., Barriot, R., Shi, Y., Van Loo, P., De Moor, B., Aerts, S., Moreau, Y. (2008) Endeavour 

update: a web resource for gene prioritization in multiple species, Nuc. Acids Res. WebServer Issue, Vol. 36, No. 

suppl_2, W377-384. 

[2] Fichant, G., Basse, M.-J., and Quentin, Y. (2006) ABCdb: an online resource for ABC transporter repertories 

from sequenced archaeal and bacterial genomes. FEMS Microbiol Lett . 256(2), 333-9.

Annotation et analyse de la famille multigénique des peroxydases 

Catherine Mathé 1 , Marie Brette 1 Bruno Savelli 1 et Christophe Dunand 1 

1 Laboratoire de Recherche en Sciences Végétales, UMR 5546, Castanet-Tolosan 

Mots clés: peroxydases, annotation, évolution, structure des gènes 

L’équipe s’intéresse aux peroxydases, enzymes présentes dans tous les règnes. Ces protéines 

catalysent des réactions durant lesquelles le peroxyde d’hydrogène est réduit en eau et un substrat est oxydé. 

Chez les plantes, elles ont des rôles fondamentaux dans différents processus physiologiques comme la 

détoxification de l’excès d’espèces actives de l’oxygène, la défense contre les pathogènes ou la formation de 

paroi cellulaire. 

La PeroxiBase (http://peroxibase.toulouse.inra.fr/) regroupe actuellement plus de 7500 séquences 

réparties en plusieurs sous-familles. Les séquences rentrées dans la base dérivent soit de recherches sur les 

génomes ou sur les banques d’EST, soit des prédictions protéiques disponibles, mais chaque fois après 

expertise et souvent ré-annotation manuelle. Cette démarche rigoureuse, garantie de la qualité de la banque, 

mais nécessite aujourd’hui d’être davantage automatisée afin de suivre le flux de séquences nouvelles. 

Dans cette optique, et aussi pour ajouter des informations quant à la structure des gènes, une procédure 

a été récemment mise en place pour localiser sur les génomes la structure des gènes de protéines de 

peroxydases connues. Elle est basée sur l’utilisation du programme Scipio [1]. Les résultats obtenus sont filtrés 

afin de mettre à jour et éventuellement de corriger les séquences, et aussi identifier de potentielles nouvelles 

peroxydases. Scipio permet aussi de prédire des séquences sur un nouveau génome, si l’on dispose des 

séquences d’un organisme suffisamment proche. Grâce à cette procédure, la PeroxiBase contient donc 

aujourd’hui des informations sur la structure des gènes, Ces nouvelles données ont permis l’installation sur la 

base d’un logiciel dédié à l’étude de la conservation des introns, Ciwog [2]. 

Actuellement, une stratégie pour identifier efficacement et correctement l’ensemble des peroxydases 

présentes dans des banques d’EST est en développement, intégrant notamment les profiles HMM spécifiques 

aux différentes classes des peroxydases. Une deuxième base de données, la Peroxibase B, non publique a été 

créée pour permettre de stocker les nouvelles séquences issues de procédures d’annotation automatique, dans 

l’attente d’une expertise. Des routines pour faciliter cette expertise doivent être mises en place. 

En parallèle avec l’annotation exhaustive et experte de cette superfamille de protéine, des analyses sont 

menées pour comprendre leur histoire évolutive. En particulier, les peroxydases de classes III, propres aux 

plantes présentent un nombre élevé et variable d’isoformes (73 chez Arabidopsis, 138 chez le riz…). Cette 

grande variation évolutive propre à une classe de protéine, amène des questions sur les mécanismes et les dates 

des événements de duplication ou de perte des gènes associés et l’hypothèse d’un lien avec une adaptation à 

des conditions, qui apporterait aussi des informations sur la fonction biologique des peroxydases. 

Publications 

[1] Guillou V., Plourde-Owobi L., Goma G., Parrou J.L., François. J. Role of glycogen and trehalose in the growth 

dynamic of the yeast Saccharomyces cerevisiae. FEMS Yeast Res. 4:773-787, 2004. 

[2] Keller O, Odronitz F, Stanke M, Kollmar M, Waack S. Scipio: using protein sequences to determine the 

precise exon/intron structures of genes and their orthologs in closely related species. BMC 

Bioinformatics. 2008 Jun 13;9:278.

GeneHuggers 

Sébastien Briois 1 et Jason Iacovoni 2 

1 Laboratoire de Biologie Cellulaire et Moléculaire du Contrôle de la Prolifération, UMR5088 CNRS, Université Paul 

Sabatier, Toulouse 

2 Plateforme Bio-informatique I2MC, INSERM, Toulouse 

Mots clés: génomique, bio-informatique, Qt/C++ Framework, CHiP-chip/seq, 

Bioinformatics application development traditionally results in either a command-line or graphical user interface. 

When faced with developing a series of applications for analysis of high-throughput sequencing data, we found 

that many programs required both a GUI, so that they could be used by the biologist, and a command-line 

interface, so that they could be employed in batch scripts. GeneHuggers is a library built on top of the Qt 

framework that aims to greatly facilitate program development. As much as possible of the routine coding 

associated with passing parameters in and out of graphical widgets has been encapsulated by the library. This 

results in a single application that can function both through the command-line and with a GUI. Even though 

GeneHuggers is still under development as a framework, a series of applications are available and have been 

used to analyze genome-wide gH2AX ChIP-chip/seq data. These programs were the motivation behind the GUI 

components of GeneHuggers and exemplify the way it can be used by programmers to focus their time on coding 

the computational task and not the interface. 

Description du projet : 

Le projet consiste à créer un framework permettant d’une part de gérer les données issues de CHiPchip/seq 

et d’autre part de créer facilement des applications ayant une interface graphique pour les biologistes et 

une interface en ligne de commande pour l’exécution en batch. 

Publications 

[1] Iacovoni JS, Caron P, Lassadi I, Nicolas E, Massip L, Trouche D, Legube G. High-resolution profiling of 

gammaH2AX around DNA double strand breaks in the mammalian genome. EMBO J. 2010 Apr 21;29(8):1446- 

57. Epub 2010 Apr 1. 

[2] Massip L, Caron P, Iacovoni JS, Trouche D, Legube G. Deciphering the chromatin landscape induced around 

DNA double strand breaks. Cell Cycle. 2010 Aug 1;9(15):2963-72. 

[3] Iacovoni JS. GeneHuggers: database mining and application connectivity tools for subsequence analyses of 

the human genome. Bioinformatics. 2003 Nov 22;19(17):2316-8.

Utilisation d'environnements Ensembl/BioMart/DAS pour l'expoitation de 

résultats issus des NGS 

Patrice Déhais (SIGENAE, LGC, INRA, Toulouse) 

Mots clés: NGS, base de données, browser de génome. 

Sigenae (http://www.sigenae.org) est une équipe de service en bio-informatique créée en 2002 dans la 

mouvance du programme AGENAE (http://www.agenae.fr) sur l'Analyse du GENomes des Animaux d'Elevage. 

L'assemblage de séquences d'étiquettes (EST), l'annotation des contigs obtenus, et la mise à disposition des 

données ainsi générées via des sites Web a été, et est encore le fond de commerce de l'équipe. 

Très tôt un environnement Ensembl/BioMart a été installé localement et adapté pour la présentation de données 

d'assemblage d'EST, en reprenant la structure des bases de données d'Ensembl et en y insérant nos données 

propres. 

Avec l'amélioration des techniques de séquençage, nous sommes passés de quelques dizaines de milliers de 

séquences par lot à nos début, à quelques centaines de milliers avec les derniers séquenceurs de type Sanger, 

puis à près d'un million avec les machines 454 de Roche, et enfin à plusieurs centaines de millions voire un 

milliard avec le HiSeq d'Illumina. 

Les analyses, qui autrefois se faisaient de façon exploratoire « à l'œil » sur des petits jeux de données, 

nécessitent aujourd’hui la mise en place (i) de traitements permettant d’obtenir une synthèse plus ou moins 

statistique des données, et (ii) d'outils de filtrage souples et adaptés pour sélectionner une zone d'intérêt ; zone 

qu'il convient ensuite (iii) de visualiser avec son environnement sur le génome, ce avec le maximum 

d'annotations, si possible à jour. 

La solution Ensembl/BioMart/DAS retenue par l'équipe permet de répondre à ces exigences. La recherche de 

SNP lignée spéficique chez la caille servira ici de cas d'exemple et permettra d'illustrer la mise en place d'un tel 

environnement.

RNAspace: web sites buider 

to support prediction, annotation and analysis of non-coding RNA 

Marie-Josée CROS 1 , Antoine de MONTE 2 , Jérôme MARIETTE 3 , Philippe BARDOU 4 , Daniel 

GAUTHERET 5 , Hélène TOUZET 2 and Christine GASPIN 1,3 

1 

INRA, Unité de Biométrie et Intelligence Artificielle, UR 875, F-31320 Castanet, France 

2 LIFL, UMR CNRS 8022 Université Lille 1 and INRIA Lille Nord Europe, Franc 

3 

INRA, Plateforme bioinformatique, F-31320 Castanet, France 

4 

INRA, SIGENAE, UMR 444, F-31320 Castanet France 

5 

IGM UMR 8621 CNRS-U Paris sud, France 

contact@rnaspace.org 

Mots clés : non-protein-coding RNA, genome annotation, ncRNA gene finder 

RNAspace is an environment that allows to create web sites dedicated to non-coding RNA (ncRNA) prediction, 

annotation and analysis. The web sites allow users to run a variety of tools in an integrated and flexible way. 

RNAspace is focused on the integration of complementary ncRNA gene finders. It also offers a set of tools for 

the comparison, visualization, edition and export of ncRNAs candidates. Predictions can be filtered according to 

a large set of characteristics. 

A public web site http://rnaspace.org has been created that allows for on line annotation of a complete 

bacterial genome or a small eukaryotic chromosome. 

Publications 

[1] Cros M.J., de Monte A., Mariette J., Bardou P., Gautheret D., Touzet H, Gaspin C. rnaspace.org: a rich web 

application for ncRNA identification. Poster in JOBIM, 2010. 

[2] Cros M.J., de Monte A., Mariette J., Bardou P., Gautheret D., Touzet H, Gaspin C. RNAspace: an integrated 

environment for the prediction, annotation and analysis of non-coding RNA. Submitted, 2011.

ng6 : Next Generation Sequencing Information System 

Jérôme Mariette 1, Nicolas Allias 2 , Céline Noirot 1 , Gérald Salin 2 , Sylvain Thomas 1 , Christophe Klopp 1 

1 Plate-forme bio-informatique Genotoul, INRA, Biométrie et Intelligence Artificielle, BP 52627, 31326 Castanet-Tolosan 

Cedex, France. 

2 Plateforme GET-PlaGe Genotoul, INRA, Laboratoire de Génétique Cellulaire, BP 52627, 31326 Castanet-Tolosan 

Cedex, France. 

Keywords : Next Generation Sequencing, workflow, bio-informatique 

NGS platforms are now well implanted in sequencing centres and some laboratories. Upcoming small scale 

technological platforms such as 454 junior from Roche and MySeq from Illumina will increase the number of 

laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily 

manageable environment to store and process the produced data. 

We present in this abstract a global information system able to manage NGS data. It include, on one hand, a set 

of pipelines adapted to the input data format (fasta, fastq), the sequencer used (454, Illumina) and the kind of 

analysis to perform (gDNA, cDNA, RNAseq, 16S, and so on …) and, on the other hand, a secured web site giving 

access to the results. The user will be able to download raw data and browse several basic analysis such as reads 

quality statistics [2], contamination search, or even reads cleaning [3, 4]. The system has three levels : projects, 

runs and analysis. A project can include several runs. A run can have been used as input in several analysis. 

Ng6 has been initiated through a collaboration between the bioinformatic and the genomic platforms of 

Genotoul. The tool is based upon ergatis [5] workflow management system which was chosen because of its 

ability to iterate through multiple files permitting to run the calculation on the local cluster. In addition, using 

such a system allows to add as many pipelines as the imagination can design. 

Publications 

[1] http://typo3.org/ 

[2] http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ 

[3] https://mulcyber.toulouse.inra.fr/projects/pyrocleaner/ 

[4] -Alvarez V., Teal T. and Shmidt T: Systematic artifacts in metagenomes from complex microbial 

communities.The ISME Journal2009 

[5] Orvis J, Crabtree J, Galens K, Gussman A, Inman JM, Lee E, Nampally S, Riley D, Sundaram JP, Felix V, Whitty 

B, Mahurkar A, Wortman J, White O, Angiuoli SV. Ergatis: A web interface and scalable software system for 

bioinformatics workflows. Bioinformatics. 2010 Jun 15;26(12).

A Comparative Study of Statistical Methods for Detecting Association with 

Rare Variants in Exome-Resequencing Data 

Nora Bohossian 1,2 , Mohamad Saad 1,2 , Aude Saint Pierre 1,2 , Matthias Macé 1 , Maria Martinez 1,2 . 

1 

INSERM U563, CHU Purpan, Toulouse, France 

2 Université Paul Sabatier, Toulouse, France 

Keywords: Next generation sequencing, Common disease, rare variants 

Abstract 

Genome-wide association studies for complex traits are based on the common disease-common variant (CDCV) 

and common disease-rare variant (CDRV) assumptions. Under the CDCV hypothesis, classical genome wide 

association studies using single marker test are powerful in detecting common susceptibility variants, but they 

are not as powerful under the CDRV hypothesis. Several methods have been recently proposed aiming to detect 

association with multiple rare variants collectively [1-4] in a functional unit such as a gene. 

In this study, we compared the relative performance of several of these methods in the GAW17 data. 

This is a sequencing data of 697 subjects provided from the 1000 Genomes Project [5] and their genotypes in 

exonic regions of only 3205 genes. In the GAW17 data, three quantitative and one binary traits were simulated 

and the genotypes were held fixed for all simulation replicates. The functional variants influencing the traits, 

include both rare and common alleles and a range of effect sizes, most having small effects but a few having 

large effects that should be reliably detectable in most replicates. Some genes contain a single functional variant 

and others contain many. 

The association methods we compared are all based on the collapsing (CA) of multiple variants within a gene. 

They differ according to: (i) filtering out or not the variants (SNPs) according to their Minor Allele Frequency 

(MAF) values; (ii) collapsing SNPs weighted or not by their allelic frequency variances; (iii) collapsing SNPs into a 

single versus multiple groups/variables. 

In these data, we found that the collapsing methods, which include all SNPs, showed greater power, even for 

genes where all causative variants are rare (MAF

Chorégraphie des gènes dans le noyau de Levure : vers une analyse haut débit 

Olivier GADAL 1,2 et Alain Kamgoue 1,2 

1 Laboratoire de Biologie Moléculaire des Eucaryotes du CNRS 

2 Université de Toulouse, F-31000 Toulouse, France 

Mots clés: Imagerie confocale, analyse d’image, analyse haut-débit en image, bio-informatique 

Bien que de nombreux génome eucaryote soit maintenant séquencé, on ne sait toujours pas comment 

s’organise l’information génétique dans le volume du noyau. Ce niveau d’organisation est souvent décris comme 

une boîte noire, inaccessible à nos approches expérimentales. En particulier, les approches de microscopie à 

fluorescence sont limitées par la résolution optique du microscope (

JournÃ©e annuelle plateforme Bio-Informatique Jeudi 31 Mars 2011 ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?