13.08.2013 Views

BIOBAR : USER MANUAL

BIOBAR : USER MANUAL

BIOBAR : USER MANUAL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>BIOBAR</strong> : <strong>USER</strong> <strong>MANUAL</strong><br />

INSTALLING:<br />

AUTOMATIC INSTALLATION<br />

o Make sure Biobar.dotm is open and double-click the “Click to Install Biobar”<br />

button in the document<br />

<strong>MANUAL</strong> INSTALLATION<br />

o The Biobar.dotm file needs to be placed in the STARTUP folder for Microsoft<br />

Word so that the toolbar is available for every document.<br />

XP -<br />

C:\Documents and Settings\\Application<br />

Data\Microsoft\Word\Startup<br />

Vista –<br />

C:\Users\\AppData\Roaming\Microsoft\Word\STARTUP<br />

SELECTION:<br />

OPTIONS:<br />

Select the sequence(s) that you wish to perform an operation on<br />

You should have two enters between sequences<br />

o NOTE: if you are entering sequences in FASTA format (with a “>” denoting the<br />

header), the “>” will be used to separate sequences, NOT two enters<br />

Basic Options<br />

o Default Output<br />

Raw - your results will be displayed in the raw format<br />

FASTA - your results will be displayed in the FASTA format<br />

if a header was not provided, a numbered default one will be<br />

created<br />

GenBank – your results will be displayed in the GenBank format<br />

o Default Output Location<br />

Append to Document – your results will be printed beneath the selection<br />

Create New Document – your results will be printed on a new Document<br />

Save to Clipboard – your results will be saved to the Clipboard; use Paste<br />

(Ctrl + V) to retrieve them<br />

Replace Selection – your results will overwrite the selection<br />

Advanced Options (Pop-up)


o % ATCG for DNA<br />

This represents the percentage of A’s, T’s, C’s and G’s that must be in a<br />

sequence for it to be considered DNA<br />

NOTE: the percentage must be entered as an integer, so enter 70 for<br />

70%, NOT 0.7<br />

o Allow IUB Code<br />

If checked, the following IUB characters are allowed in DNA sequences:<br />

A, T, C, G, R, Y, M, K, S, W, H, B, V, D, N<br />

Characters other than the above are filtered out<br />

If not checked, the following characters are allowed in DNA sequences<br />

A, T, C, G<br />

Whether a sequence is DNA or Protein is determined by % ATCG<br />

Characters other than A, T, C or G are filtered out<br />

o If “ATGAAMGA” were selected, “ATGAAGA” would be<br />

processed<br />

o Pseudocounts<br />

LaPlace: pseudocounts for the PSFM calculated using LaPlace’s method<br />

10^-50: pseudocounts intended only to avoid 0 values for log<br />

calculations<br />

Translation Options<br />

o Offset<br />

Refers to the frame of translation (0, +1, +2)<br />

o Genetic Code (GCode)<br />

Denotes the genetic code used in translation and reverse translation<br />

Standard<br />

Vertebrate Mitochondrial<br />

Yeast Mitochondrial<br />

Mold, Protozoan and Coelenterate Mitochondrial/Mycoplasma<br />

and Spiroplasma Mitochondrial<br />

Invertebrate Mitochondrial<br />

Ciliate, Dasycladacean and Hexamita Nuclear<br />

Echinoderm and Flatworm Mitochondrial<br />

Euplotid Mitochondrial<br />

Bacterial, Archaeal and Plant Plasmid<br />

NOTE: this option is overridden if a Codon Usage Table is entered<br />

o Codon Usage Table<br />

If not checked, the default Genetic Code selected will be used in reverse<br />

translation. The following options are allowed:<br />

Uniform - for each amino acid in the sequence(s), a random codon<br />

will be chosen from those associated with that amino acid


IUB - for each amino acid in the sequence, its IUB code will be<br />

used<br />

If checked, user will be prompted to enter a Codon Usage Table after the<br />

Reverse Translation button has been pressed, which will be used in<br />

reverse translation. The following additional options are allowed:<br />

Best - for each amino acid in the sequence(s), the codon with the<br />

highest frequency associated with that amino acid (determined<br />

by Codon Usage Table) will be selected<br />

Random Best - for each amino acid in the sequence(s), a random<br />

number will be generated and a codon associated with that amino<br />

acid will be selected relative to its frequency (determined by<br />

Codon Usage Table)<br />

NOTE: Please use the codon usage tables found on<br />

http://www.kazusa.or.jp/codon/ using the option for “a style like<br />

CodonFrequency output in GCG”<br />

DNA Statistics Options<br />

o Window-Based % GC<br />

Length – this value will serve as the sliding window length when<br />

calculating % GC<br />

Step-size – the value will serve as the number of base-pairs the sliding<br />

window shifts each time (useful for long sequences)<br />

o N-Gram<br />

The user-entered value will serve as n (the length of each “gram”)<br />

Only display N-Grams found in the sequence<br />

With this option selected, only N-Grams with a nonzero count will<br />

be displayed in the table.<br />

o If ATAG were selected, with n = 2, only AT, TA, AG will be<br />

displayed in the table rather than AA, AC, AG, AT,…,TT<br />

Include the reverse complement<br />

With this option selected, the counts of N-Grams of the reverse<br />

complement will be added to the counts of the original sequence<br />

o Molecular Weight<br />

Consider DNA sequences double stranded<br />

Include a 5’ triphosphate on RNA sequences<br />

Include a 5’ monophosphate on DNA sequences<br />

o Codon Usage Table<br />

Offset<br />

Refers to the frame (0, +1, +2)<br />

Table Display


White-space-the table will be displayed in the same form as the<br />

codon usage tables from http://www.kazusa.or.jp/codon/<br />

Table-the table will be displayed in a Microsoft Word table<br />

Substring Search Options<br />

o ORF<br />

Minimum codon length – defines the minimum length of an open reading<br />

frame to be reported<br />

o Optimize based on:<br />

Length - when selected, longer ORFs will be ranked higher<br />

CAI and Length – when selected, CAI scores are also taken into<br />

consideration in ranking ORFs<br />

o For Substring and Substring with Gap<br />

Mismatch Threshold – only sequences with less mismatches than the<br />

value entered will be displayed<br />

Site Search<br />

o Ri and Iseq Scoring Options<br />

% GC of Genome– this value will be used to calculate the composition of<br />

the genome sequence that the motif is scored upon<br />

Motif – enter a motif of sequences, all the same length in either Raw or<br />

FASTA format<br />

o Dyad Pattern Search<br />

% GC of Genome – this value will be used to calculate the composition of<br />

the genome sequence that the motif is scored upon<br />

Motif – enter a motif of sequences, all the same length in either Raw or<br />

FASTA format<br />

Threshold - each dyad found must have a score higher than a userdefined<br />

constant times the information content of the motif to be<br />

reported<br />

Gap – defines the range of base-pairs as a spacer between two dyads<br />

For second dyad:<br />

Mirror Motif – when selected, the other half of the dyad will be the<br />

reverse complement of the motif entered<br />

Duplicate Motif – when selected, the other half of the dyad will be<br />

the same as the motif entered<br />

o Consensus Logo<br />

% GC of Genome – this value will be used to calculate the composition of<br />

the genome sequence that the motif is scored upon<br />

Use IUB for equally probably bases – if this option is checked, IUB<br />

characters will be used for two bases that occur at equal frequencies in a<br />

position in the motif. For example, if “A” and “G” are equally likely, “R”<br />

will be used for that position


Motif Discovery<br />

o Gibbs Sampling<br />

% GC of Genome – this value will be used to calculate the composition of<br />

the genome sequence that the motif is scored upon<br />

Window Length – this value determines the length of the window<br />

sampled by Gibbs Sampling<br />

Number of Iterations – the number of times the program should cycle<br />

before reporting results (higher number of iterations means slower, but<br />

more accurate results)<br />

o Greedy Search<br />

% GC of Genome – this value will be used to calculate the composition of<br />

the genome sequence that the motif is scored upon<br />

Window Length – this value determines the length of the window<br />

sampled by Gibbs Sampling<br />

Number of Iterations – the number of times the program should cycle<br />

before reporting results (higher number of iterations means slower, but<br />

more accurate results)<br />

o Dyad Motif Search<br />

Dyad Length – if values 5 ± 1 were entered, dyads of length 4, 5, and 6<br />

will be considered<br />

Spacer Length – if values 5 ± 1 were entered, spacers of length 4, 5, and 6<br />

will be allowed<br />

Mismatch Threshold – determines the maximum number of mismatches<br />

that should be allowed between the two dyads<br />

Palindrome – the dyads must be reverse complements of each other<br />

Direct Repeat – the dyads must be identical<br />

Pair-wise Alignment<br />

o Scoring Options<br />

The first selection allows the user to define constant values to be used to<br />

score matches and mismatches<br />

The second selection allows the user to define more specific values if<br />

they would like to weigh certain mismatches/matches as less or more<br />

important than another<br />

NOTE: if protein sequences are selected, BLOSUM62 will be used to score<br />

the matches and mismatches between the sequences<br />

o GEP – this value defines the gap extension penalty<br />

o GOP – this value defines the gap opening penalty<br />

o Limit results to X alignments – in the case of split paths, the program will report<br />

no more than X sequences


FUNCTIONALITY :<br />

Basic Manipulation<br />

o Convert Sequence To:<br />

Raw – The sequence(s) selected will be printed in the raw format<br />

FASTA – The sequence(s) selected will be printed in FASTA format<br />

A default header (> # Default) will be created if one is not<br />

provided (> # [user input])<br />

GenBank – The sequence(s) selected will be printed in GenBank format<br />

o Reverse<br />

DNA, RNA and protein sequences can be reversed<br />

ATAGTAGAT -> TAGATGATA<br />

o Complement<br />

Only DNA and RNA sequences may be complemented. An error message<br />

will be displayed if a protein sequence is selected. If DNA, RNA, and<br />

protein sequences are selected, an error message will be displayed and<br />

only the DNA and RNA sequence(s) will be complemented. IUB<br />

characters will be ignored.<br />

ATAGTAGAT -> TATCATCTA<br />

o Reverse Complement<br />

Only DNA and RNA sequences may be complemented. An error message<br />

will be displayed if a protein sequence is selected. If DNA, RNA and<br />

protein sequences are selected, an error message will be displayed and<br />

only the DNA and RNA sequence(s) will be reverse complemented. IUB<br />

characters will be ignored.<br />

ATAGTAGAT -> ATCTACTAT<br />

Translation<br />

o Forward (Translation)<br />

Only DNA sequences may be translated. An error message will be<br />

displayed if a protein sequence is selected. If DNA, RNA and protein<br />

sequences are selected, an error message will be displayed and only the<br />

DNA sequence(s) will be translated.<br />

If IUB is selected:<br />

Due to possible duplicities with IUB code, a list of all codons<br />

corresponding to that IUB code will be generated and a random<br />

one will be chosen to be used in translation<br />

o Example:<br />

GCode = Standard:<br />

TGB -> [TGC, TGT, TGG] -> C or W


If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D,<br />

N) is found, an error message will be displayed and translation<br />

will not occur<br />

Translation will be performed according to the default Genetic Code<br />

selected and according to the offset specified in Translation Options<br />

NOTE: additional bases will be removed<br />

Examples: (italicized characters are ignored)<br />

Offset = 0, GCode = Standard: ATAGTAGAT -> IVD<br />

Offset =1, GCode = Standard: ATAGTAGAT -> **<br />

o Reverse (Translation)<br />

Only protein sequences may be reverse translated. An error message<br />

will be displayed if a DNA or RNA sequence is selected. If DNA, RNA and<br />

protein sequences are selected, an error message will be displayed and<br />

only the protein sequence(s) will be reverse translated.<br />

Reverse Translation will be performed according to the method specified<br />

in the Reverse Translation/Translation Options<br />

If the user provides a Codon Usage Table, the reverse translation<br />

will occur according to the Codon Usage Table rather than the<br />

default Genetic Code<br />

Examples:<br />

o GCode = Standard: IVD -> ATAGTAGAT<br />

o (Translation) Map<br />

Only DNA sequences can be used to create a translation map. An error<br />

message will be displayed if a protein or RNA sequence is selected. If<br />

DNA, RNA, and protein sequences are selected, an error message will be<br />

displayed and translation maps will only be displayed for the DNA<br />

sequence(s). The sequence(s) is (are) translated for each of the frames<br />

and all of the resultant sequences are displayed together.<br />

If IUB is selected:<br />

Due to possible duplicities with IUB code, a list of all codons<br />

corresponding to that IUB code will be generated and a random<br />

one will be chosen to be used in translation<br />

o Example:<br />

GCode = Standard:<br />

TGB -> [TGC, TGT, TGG] -> C or W<br />

If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D,<br />

N) is found, an error message will be displayed and translation<br />

will not occur<br />

DNA Sequence Statistics<br />

o Global %GC


Only DNA sequences can be used.. An error message will be displayed if<br />

a protein or RNA sequence is selected. If both DNA and protein<br />

sequences are selected, an error message will be displayed and %GC will<br />

only be calculated for the DNA sequence(s).<br />

Calculates the %GC content for the entire sequence and displays it in a<br />

table.<br />

o Window %GC<br />

Only DNA sequences can be used.. An error message will be displayed if<br />

a protein or RNA sequence is selected. If both DNA and protein<br />

sequences are selected, an error message will be displayed and %GC will<br />

only be calculated for the DNA sequence(s).<br />

Calculates the %GC content for windows of length defined by user in the<br />

Sequence Statistics Options and displays them in a table, indexed by<br />

position.<br />

o Nucleotide Frequencies<br />

Only DNA sequences can be used. An error message will be displayed if a<br />

protein or RNA sequence is selected. If both DNA and protein sequences<br />

are selected, an error message will be displayed and nucleotide<br />

frequencies will only be calculated for the DNA sequence(s).<br />

Calculates frequency of each nucleotide for the entire sequence and<br />

displays it in a table.<br />

o N-Gram<br />

Only DNA sequences can be used. An error message will be displayed if<br />

a protein or RNA sequence is selected. If sequences of all types are<br />

selected, the N-Grams will only be calculated for the DNA sequence(s).<br />

IUB characters will be ignored<br />

Takes the value of N from Sequence Statistics Options and generates a list<br />

of all possible “words” of length N (from the characters A, T, C, G) and<br />

their counts within the sequence.<br />

See Sequence Statistics Options for other options relating to N-Grams<br />

o Codon Usage Table<br />

Only DNA sequences can be used. An error message will be displayed if<br />

a protein or RNA is selected. If sequences of all types are selected, the<br />

codon usage table will only be generated for the DNA sequence(s). IUB<br />

characters will be ignored<br />

Generates a codon usage table (using the offset from Sequence Statistics<br />

Options) using the sequence(s) that are selected.<br />

See Sequence Statistics Options for options relating to table display<br />

o Calculate Molecular Weight<br />

Open to DNA, RNA and protein sequences.


See Sequence Statistics Options for options relating to Molecular Weight<br />

Protein Sequence Statistics<br />

o GRAVY<br />

GRAVY can only be calculated for protein sequences. An error message<br />

will be displayed if a DNA or RNA sequence is selected. If sequences of all<br />

types are selected, GRAVY will only be calculated for protein<br />

sequence(s).<br />

GRAVY is the average of the hydropathicites of the amino acids.<br />

o Isoelectric Point<br />

Isoelectric points can only be calculated for protein sequences. An error<br />

message will be displayed if a DNA or RNA sequence is selected. If<br />

sequences of all types are selected, the isoelectric point will only be<br />

generated for protein sequence(s).<br />

The isoelectric point is the pH at which the protein carries a zero net<br />

electric charge.<br />

o Generate Protein Statistics<br />

Protein statistics can only be calculated for protein sequences. An error<br />

message will be displayed if a DNA or RNA sequence is selected. If<br />

sequences of all types are selected, the protein statistics will only be<br />

generated for the strict DNA sequence(s).<br />

Protein statistics provides a count and percentage of each of the amino<br />

acids for each sequence.<br />

Substring Search<br />

o Find ORFs (Open Reading Frames)<br />

ORFs can only be found for DNA sequences. An error message will be<br />

displayed if a protein or RNA sequence is selected. If sequences of all<br />

types are selected, the ORFs will only be found for the DNA sequence(s).<br />

ATG, TTG, CTG and GTG are all considered start codons<br />

Different Output Locations:<br />

BELOW – Only prints the top-scoring ORF<br />

REPLACE – Highlights the top-scoring ORF<br />

CLIPBOARD – Saves all discovered ORFs<br />

NEW DOC – Prints all discovered ORFs<br />

See Motif Discovery Options for options relating to ORFs<br />

o Find Substring<br />

Substring searches can only be completed for DNA sequences. An error<br />

message will be displayed if a protein or RNA sequence is selected. If<br />

sequences of all types are selected, the substring search will only be<br />

completed for the DNA sequence(s).


IUB characters are considered fractional mismatches depending on<br />

which DNA bases they stand for. For example, B is considered to be 1/3<br />

T, 1/3 G and 1/3 C.<br />

Find Substring with Gap allows the user to search for two substrings<br />

separated by a gap of defined length.<br />

Different Output Locations<br />

BELOW, NEW DOC and CLIPBOARD<br />

o Returns all results<br />

REPLACE<br />

o Highlights the matches in the sequence with a darker color<br />

representing a better scoring sequence<br />

Site Search<br />

o Search (Ri sequence and I sequence)<br />

These Site Search mechanisms can only be completed for DNA sequences.<br />

An error message will be displayed if a protein or RNA sequence is<br />

selected. If sequences of all types are selected, the site search will only<br />

be completed for the DNA sequence(s).<br />

In the motif, only DNA sequences may be entered.<br />

Reports the Ri Score or Iseq Score<br />

o Dyad Pattern Search<br />

Dyad Pattern Search can only be completed for DNA sequences. An error<br />

message will be displayed if a protein or RNA sequence is selected. If<br />

sequences of all types are selected, the site search will only be completed<br />

for the DNA sequence(s).<br />

In the motif, only DNA sequences may be entered.<br />

Different Output Locations:<br />

BELOW, NEW DOC and CLIPBOARD<br />

o Returns all results<br />

REPLACE<br />

o Highlights the dyads in the sequence with a darker color<br />

representing a better scoring sequence<br />

Motif Discovery<br />

o Gibbs Sampling and Greedy Search<br />

Gibbs Sampling can only be completed for DNA sequences. An error<br />

message will be displayed if a protein or RNA sequence is selected. If<br />

sequences of all types are selected, the sampling will only be completed<br />

for the DNA sequence(s).<br />

The program will complete the number of iterations requested 10 times<br />

and return the motif that has the greatest IC content<br />

Different Output Locations:<br />

BELOW, NEW DOC and CLIPBOARD


o Returns the best result<br />

REPLACE<br />

o Highlights the motif in each sequence<br />

o Dyad Motif Search<br />

The Dyad Motif Search can only be completed for DNA sequences. An<br />

error message will be displayed if a protein or RNA sequence is selected.<br />

If sequences of all types are selected, the motif search will only be<br />

completed for the DNA sequence(s).<br />

o Consensus Logo<br />

The consensus logo can only be generated with a motif of DNA sequences<br />

The motif to generate a pseudo-consensus logo can be selected in the<br />

document or entered into a pop-up input box when the button is clicked.<br />

Information Content will be calculated using R sequence or Relative<br />

Entropy, depending on which is selected in the Advanced Options<br />

The text is likely to be small, and can be scaled up or down using the<br />

arrows in the Resources Group (though resolution may be lost)<br />

Pair-Wise Alignment<br />

o Needleman-Wuncsh and Smith-Watermann<br />

Pair-wise alignment can be performed with both DNA and Protein<br />

sequences.<br />

Protein sequences are scored using the BLOSUM62 matrix and DNA<br />

sequences are scored using user-defined values.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!