11.07.2015 Views

Computer Exercise 3: Introduction to Bioinformatics 1

Computer Exercise 3: Introduction to Bioinformatics 1

Computer Exercise 3: Introduction to Bioinformatics 1

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

C. Baer, PCB 4674 Fall 2010Q11. WHAT IS THE gi NUMBER, THE ACCESSION NUMBER, AND THE MAX SCORE OFTHE SEQUENCE WITH THE HIGHEST SCORE?Q12. HOW LONG IS THE SEQUENCE?Repeat this exercise for the human cDNA and find the human genomic sequence that providesthe best match <strong>to</strong> the cDNA. You can BLAST the human cDNA accession number given <strong>to</strong> youon page 1 of this lab, but remember <strong>to</strong> limit the search <strong>to</strong> human. Also, if two or morealignments have the same max score, just choose the <strong>to</strong>p one.Q13-14. REPEAT QUESTIONS 11-12 FOR THE HUMAN GENOMIC SEQUENCE.V. Finding Introns with BLASTNext we will use BLAST <strong>to</strong> help find the introns in the gene. Recall that the cDNA sequencewas obtained by reverse transcription of a mRNA. Thus, the sequence had the introns splicedout.1. Return <strong>to</strong> the BLAST home page and look for the "Specialized BLAST" heading. Click on"align two sequences using BLAST". The default "Program" will be "blastn," which alignsnucleotide sequences. Enter the accession number for the mouse cDNA given <strong>to</strong> you on page1 of this lab in<strong>to</strong> the “Sequence 1” window. Enter the accession number or gi number of themouse wgs genomic sequence you just found in question 11 in<strong>to</strong> the "Sequence 2" window andclick the "align" but<strong>to</strong>n. The reason you are using BLAST <strong>to</strong> align these sequences rather thanClustalW is that ClustalW is not good at handling long regions of unmatched sequence, asoccur when you attempt <strong>to</strong> align cDNA sequence with genomic sequence.2. When the results appear, scroll down through the output. Note that the query sequencealigns <strong>to</strong> the target sequence in pieces; the pieces of the query sequence correspond <strong>to</strong> theexons of the gene; the missing pieces of the target sequence are the introns. Note that thelongest matching sequence is listed first, so you will have <strong>to</strong> piece <strong>to</strong>gether the sequence of thegene. Also note that there may be small amounts of overlapping sequence, and that there are afew small pieces of the query sequence that appear <strong>to</strong> match other pieces of the mousegenomic DNA contig ("contig" stands for "contiguous sequence" and is the process by whichshort sequence reads (~1 kb) are pieced <strong>to</strong>gether in<strong>to</strong> whole chromosome sequence.Q15. HOW MANY EXONS ARE IN THE MOUSE ADH-1 GENE?Q16. WHAT ARE THE LENGTHS OF THE EXONS? (e.g., "Exon 1=x1, Exon 2 = x2, etc.)Q17. WHAT ARE THE LENGTHS OF THE INTRONS (ROUGHLY, WITHIN A FEW BP)?VI. Comparative GenomicsThe last exercise is <strong>to</strong> compare the human ADH-IB gene with the mouse ADH1 gene. Return <strong>to</strong>the "BLAST 2 sequences” window. In the first window (sequence 1), enter the sequenceaccession number for the human cDNA given <strong>to</strong> you on page 1 of this lab. In the secondwindow enter the accession number or gi number from the mouse genomic DNA sequence youdetermined in question 11. Click "Align".Q18. HOW MANY REGIONS OF ALIGNMENT ARE THERE?5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!