12.07.2015 Views

Developments in the detection of genetic variation (SNPs, CNVs and ...

Developments in the detection of genetic variation (SNPs, CNVs and ...

Developments in the detection of genetic variation (SNPs, CNVs and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Overview SNP typ<strong>in</strong>g: Current status• Examples <strong>in</strong> chicken SNP discovery us<strong>in</strong>g 2 nd generation sequenc<strong>in</strong>gtechnology (Solexa) Copy Number Variation (CNV) Re-sequenc<strong>in</strong>g <strong>of</strong> genomesAnimal Breed<strong>in</strong>g &Genomics Centre


SNP genotyp<strong>in</strong>g assays10 6 <strong>SNPs</strong>100-500K <strong>SNPs</strong>60K <strong>SNPs</strong>1536 <strong>SNPs</strong>1-384 <strong>SNPs</strong>custom madehumanAnimal Breed<strong>in</strong>g &Genomics Centre


The <strong>in</strong>f<strong>in</strong>ium assay12 samples / chipAnimal Breed<strong>in</strong>g &Genomics Centre


High call rates F<strong>in</strong>al chip conta<strong>in</strong>ed 18,264 <strong>SNPs</strong> Genotytp<strong>in</strong>g calls for 17,790 <strong>SNPs</strong> (97.4 %)AAABBBAnimal Breed<strong>in</strong>g &Genomics Centre


Many <strong>SNPs</strong> with multiple heterozygous clusters Example <strong>of</strong> a SNP with populationdependent cluster<strong>in</strong>g <strong>of</strong> allelesAnimal Breed<strong>in</strong>g &Genomics Centre


Application: Population Genetics us<strong>in</strong>g pools Pools are a cost-effective way to identify what<strong>SNPs</strong> are segregat<strong>in</strong>g <strong>in</strong> a specific breed Highly reliable estimates <strong>of</strong> allele frequencies Estimate Phylogeny Identify regions under selection (selectivesweeps)Animal Breed<strong>in</strong>g &Genomics Centre


High correlation between <strong>in</strong>dividuals <strong>and</strong> poolsR 2 =0.97K-correction <strong>of</strong> MAF based on <strong>the</strong>signals <strong>in</strong> <strong>the</strong> heterozygote clusterR 2 =0.99MAF calculated based on pools from30- 60 <strong>in</strong>dividualsFour populations typed both as pools<strong>and</strong> <strong>in</strong>dividualsAnimal Breed<strong>in</strong>g &Genomics Centre


High correlation between <strong>in</strong>dividuals <strong>and</strong> poolsGallus LafayettieIdentification <strong>of</strong> <strong>the</strong>ancestral alleleData from <strong>in</strong>dividuallygenotyped animalscompared to a pooledDNA sample <strong>of</strong> thatbreedAnimal Breed<strong>in</strong>g &Genomics Centre


Identification <strong>of</strong> selective sweepsChromosome 2Likelihood ratioLocation (bp)Animal Breed<strong>in</strong>g &Genomics Centre


Overview SNP typ<strong>in</strong>g: Current status SNP discovery us<strong>in</strong>g 2 nd generation sequenc<strong>in</strong>gtechnology (Solexa) Copy Number Variation (CNV) Re-sequenc<strong>in</strong>g <strong>of</strong> genomesAnimal Breed<strong>in</strong>g &Genomics Centre


Ultrahigh throughput sequenc<strong>in</strong>g (Solexa, Illum<strong>in</strong>a GA)> 1 billion bp <strong>in</strong> a s<strong>in</strong>gle run at ~€10,000Read length 36 bp3-8 million reads per lane8 lanes per flow cell1 sequence run = 3 daysTGCTACGAT …123456789TTTTTTTGT …Animal Breed<strong>in</strong>g &Genomics Centre


SNP <strong>detection</strong>: two strategiesReference genome availablePigsChickenNo reference genome availableTurkeyDuckGreat titTilapiaSalmon (planned)ReferencegenomeSNPSNP assay primersAnimal Breed<strong>in</strong>g &Genomics Centre


Strategy 1: With reference genomeReduced representationlibrary (RRL)Restriction digest <strong>of</strong> DNASeparate on acrylamidegelsIsolate 150-200 bpfragments6x 6x 6x 6x6xEstimate <strong>of</strong> MAFwith<strong>in</strong> breeds30xHighly reliable SNP identificationAnimal Breed<strong>in</strong>g &Genomics Centre


Analysis overviewRaw GA sequences(Solexa)Pre-maq perl scriptFilteredGA sequences(Solexa)Alignment to referencegenome <strong>and</strong> SNPidentification us<strong>in</strong>g MAQPost-maq perl scriptHigh quality<strong>SNPs</strong> with MAFAnimal Breed<strong>in</strong>g &Genomics Centre


Alignment <strong>and</strong> SNP <strong>detection</strong> us<strong>in</strong>g MAQHaeIII<strong>SNPs</strong>HaeIII173 bpAnimal Breed<strong>in</strong>g &Genomics Centre


Alignment <strong>and</strong> SNP <strong>detection</strong> us<strong>in</strong>g MAQHaeIIIHaeIIIC/TG/CAnimal Breed<strong>in</strong>g &Genomics Centre


Strategy 1: With reference genome Pigs (f<strong>in</strong>ished) 300 million 36 bps reads (after QC, raw data=450 million) 5x5 different RRLs represent<strong>in</strong>g + 10% <strong>of</strong> <strong>the</strong> genome 350,000 high quality <strong>SNPs</strong> MAF 60K Ilum<strong>in</strong>a iSelect BeadChip ordered Chicken 115 million reads (partly analyzed) 4x1 RRLs (2 Cobb, 2 HG) 6 % <strong>of</strong> genome detected 384,000 <strong>SNPs</strong> with MAF Also cover miss<strong>in</strong>g chromosomesAnimal Breed<strong>in</strong>g &Genomics Centre


Porc<strong>in</strong>e Inf<strong>in</strong>ium Beadchip <strong>SNPs</strong>A total <strong>of</strong> 60,212 <strong>SNPs</strong> were selected41,667 with coord<strong>in</strong>ates on build 7 (average spac<strong>in</strong>g <strong>of</strong> 44 Kbalong <strong>the</strong> genome)18,545 <strong>SNPs</strong> cover<strong>in</strong>g <strong>the</strong> rema<strong>in</strong>der 30% <strong>of</strong> <strong>the</strong> genome4,497 <strong>of</strong> <strong>the</strong>se 18,545 <strong>SNPs</strong> have location predicted by human-pigcomparative mapp<strong>in</strong>g57,996 <strong>SNPs</strong> had m<strong>in</strong>or allele frequency calculatedAverage MAF=0.4Test chip analysed end <strong>of</strong> october3000 <strong>SNPs</strong> replacedF<strong>in</strong>al chip ready by beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong>DecemberAnimal Breed<strong>in</strong>g &Genomics Centre


The porc<strong>in</strong>e HapMap projectAnimal Breed<strong>in</strong>g &Genomics Centre


Porc<strong>in</strong>e Hapmap panel typed by Illum<strong>in</strong>aBreed Discovery Panel Trio’s Total #L<strong>and</strong>race 29 17 76Large White 36 25 135Duroc 34 15 79Pietra<strong>in</strong> 23 15 95Wild Boar 20 - 20Hampshire - 15 69Berkshire - 15 58Meishan - - 30Wild Boar 17 - 3O<strong>the</strong>r Sus. sp. - - 6Sus celebensisSus verrucosusSus barbatusTobasco 1 Animal be<strong>in</strong>gsequencedAnimal Breed<strong>in</strong>g &Genomics Centre


Porc<strong>in</strong>e Hapmap panel typed by WU Wild boar samples from all over <strong>the</strong> world Additional breeds from Europe Additional breeds from Ch<strong>in</strong>a Genotyp<strong>in</strong>g at Illum<strong>in</strong>a: December Genotyp<strong>in</strong>g at WU: February Total number: ~ 1100Animal Breed<strong>in</strong>g &Genomics Centre


Strategy 2: Without reference genomeReduced representation library (RRL)Separate on agarose gelIsolate 2000-4000 bp fractionR<strong>and</strong>om shear<strong>in</strong>g <strong>of</strong> fragmentsIsolate 200-250 bp fragmentsAssembly <strong>of</strong> sequence contigs that can function as areference (us<strong>in</strong>g a number <strong>of</strong> assemblers e.g. SSAKE, SHARGS, VELVET)SNP assay primersAnimal Breed<strong>in</strong>g &Genomics Centre


Analysis overview strategy 2Raw GA sequences(Solexa)Pre-maq perl scriptFilteredGA sequences(Solexa)De novo assembly<strong>of</strong> contigse.g SSAKESHARGSVELVETAlignment to referencegenome <strong>and</strong> SNPidentification us<strong>in</strong>g MAQPost-maq perl scriptHigh quality<strong>SNPs</strong> with MAFAnimal Breed<strong>in</strong>g &Genomics Centre


Strategy 2: Without reference genome Turkey ~35 million 35 bp reads (after QC) Great Tit 2 RRLs done: 50 million 36 bp (10 % <strong>of</strong> <strong>the</strong> genome) 1 RRL not yet analyzed (paired end) Tilapia 30 milion reads Duck 30 milion reads paired end (not yet analyzed)Animal Breed<strong>in</strong>g &Genomics Centre


Size distribution <strong>of</strong> contigsnr <strong>of</strong> contigs340000320000300000280000260000240000220000200000180000160000140000120000100000800006000040000200000SSAKE contig assembly Tilapia vs Turkey vs Gt300049_7576_100 151_200 301_400 501_600101_150 201_300 401_500 >601contig sizeTilapiaTurkeyGt3000Animal Breed<strong>in</strong>g &Genomics Centre


<strong>SNPs</strong> identified Turkey 14,000 <strong>SNPs</strong> 50 % have sufficient flank<strong>in</strong>g sequence for assaydesign 50 % have (unique) homology with chicken 384 SNP assay ordered Great tit (one RRL f<strong>in</strong>ished) 25,225 <strong>SNPs</strong> Will be compared to Zebra f<strong>in</strong>ch genomeAnimal Breed<strong>in</strong>g &Genomics Centre


Overview SNP typ<strong>in</strong>g: Current status SNP discovery us<strong>in</strong>g 2 nd generation sequenc<strong>in</strong>gtechnology (Solexa) Copy Number Variation (CNV) Re-sequenc<strong>in</strong>g <strong>of</strong> genomesAnimal Breed<strong>in</strong>g &Genomics Centre


Copy number <strong>variation</strong> (CNV) <strong>in</strong> chicken Agilent 244K chicken array Analysed 80 <strong>in</strong>dividuals from different breeds Included a few small pedigrees Used sequenced bird (UCD001) as <strong>the</strong> referenceAnimal Breed<strong>in</strong>g &Genomics Centre


Example chicken chromosome 190 kb30 kbAnimal Breed<strong>in</strong>g &Genomics Centre


Example 2: Late fea<strong>the</strong>r<strong>in</strong>g locus 180 Kb duplication on Z-chromosome Partial duplication <strong>of</strong> PRLR <strong>and</strong> SPEF2 gene Affect<strong>in</strong>g fea<strong>the</strong>r development Used for sex<strong>in</strong>g 1 day old chicksElfer<strong>in</strong>k MG, Vallée AA, Jungerius AP, Crooijmans RP, Groenen MA. (2008) Partial duplication<strong>of</strong> <strong>the</strong> PRLR <strong>and</strong> SPEF2 genes at <strong>the</strong> late fea<strong>the</strong>r<strong>in</strong>g locus <strong>in</strong> chicken. BMC Genomics. 9:391Animal Breed<strong>in</strong>g &Genomics Centre


Example 2: Late fea<strong>the</strong>r<strong>in</strong>g locusZZfa<strong>the</strong>rZWmo<strong>the</strong>rk + / K3 copiesZWdaughterk + / -1 copyK / -2 copiesAnimal Breed<strong>in</strong>g &Genomics Centre


Overview SNP typ<strong>in</strong>g: Current status SNP discovery us<strong>in</strong>g 2 nd generation sequenc<strong>in</strong>gtechnology (Solexa) Copy Number Variation (CNV) Re-sequenc<strong>in</strong>g <strong>of</strong> genomesAnimal Breed<strong>in</strong>g &Genomics Centre


Future applications <strong>in</strong> functional genomics Increased ultra high throughput sequenc<strong>in</strong>g Shift from microarrays to direct sequenc<strong>in</strong>g <strong>of</strong> cDNA Re-sequenc<strong>in</strong>g <strong>the</strong> complete genome <strong>of</strong> <strong>in</strong>dividuals• Identify all <strong>variation</strong>s <strong>in</strong> an <strong>in</strong>dividual’s genome Sequenc<strong>in</strong>g to identify regulatory elements• E.g. miRNAs, transcription factor b<strong>in</strong>d<strong>in</strong>g sites etc.Animal Breed<strong>in</strong>g &Genomics Centre


3 rd generation sequencer with capacity to produceup to 100 billion bp <strong>of</strong> sequenceSequenc<strong>in</strong>g s<strong>in</strong>glemolecules28 Tb <strong>of</strong> data storage (sufficient for only 2 runs !)Animal Breed<strong>in</strong>g &Genomics Centre


AcknowledgementsChicken genome sequenc<strong>in</strong>g consortiumChicken SNP consortiumSw<strong>in</strong>e SNP consortiumWash<strong>in</strong>gton University, St. LouisUSDA, High-Density SNP Discovery, Validation <strong>and</strong>Characterization <strong>in</strong> Sw<strong>in</strong>eUSDA, Genome-wide marker-assisted selection overmultiple generations <strong>in</strong> commercial poultryEU-SabreEU-EadgeneThank YouAnimal Breed<strong>in</strong>g &Genomics Centre

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!