BIOBAR : USER MANUAL
BIOBAR : USER MANUAL
BIOBAR : USER MANUAL
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>BIOBAR</strong> : <strong>USER</strong> <strong>MANUAL</strong><br />
INSTALLING:<br />
AUTOMATIC INSTALLATION<br />
o Make sure Biobar.dotm is open and double-click the “Click to Install Biobar”<br />
button in the document<br />
<strong>MANUAL</strong> INSTALLATION<br />
o The Biobar.dotm file needs to be placed in the STARTUP folder for Microsoft<br />
Word so that the toolbar is available for every document.<br />
XP -<br />
C:\Documents and Settings\\Application<br />
Data\Microsoft\Word\Startup<br />
Vista –<br />
C:\Users\\AppData\Roaming\Microsoft\Word\STARTUP<br />
SELECTION:<br />
OPTIONS:<br />
Select the sequence(s) that you wish to perform an operation on<br />
You should have two enters between sequences<br />
o NOTE: if you are entering sequences in FASTA format (with a “>” denoting the<br />
header), the “>” will be used to separate sequences, NOT two enters<br />
Basic Options<br />
o Default Output<br />
Raw - your results will be displayed in the raw format<br />
FASTA - your results will be displayed in the FASTA format<br />
if a header was not provided, a numbered default one will be<br />
created<br />
GenBank – your results will be displayed in the GenBank format<br />
o Default Output Location<br />
Append to Document – your results will be printed beneath the selection<br />
Create New Document – your results will be printed on a new Document<br />
Save to Clipboard – your results will be saved to the Clipboard; use Paste<br />
(Ctrl + V) to retrieve them<br />
Replace Selection – your results will overwrite the selection<br />
Advanced Options (Pop-up)
o % ATCG for DNA<br />
This represents the percentage of A’s, T’s, C’s and G’s that must be in a<br />
sequence for it to be considered DNA<br />
NOTE: the percentage must be entered as an integer, so enter 70 for<br />
70%, NOT 0.7<br />
o Allow IUB Code<br />
If checked, the following IUB characters are allowed in DNA sequences:<br />
A, T, C, G, R, Y, M, K, S, W, H, B, V, D, N<br />
Characters other than the above are filtered out<br />
If not checked, the following characters are allowed in DNA sequences<br />
A, T, C, G<br />
Whether a sequence is DNA or Protein is determined by % ATCG<br />
Characters other than A, T, C or G are filtered out<br />
o If “ATGAAMGA” were selected, “ATGAAGA” would be<br />
processed<br />
o Pseudocounts<br />
LaPlace: pseudocounts for the PSFM calculated using LaPlace’s method<br />
10^-50: pseudocounts intended only to avoid 0 values for log<br />
calculations<br />
Translation Options<br />
o Offset<br />
Refers to the frame of translation (0, +1, +2)<br />
o Genetic Code (GCode)<br />
Denotes the genetic code used in translation and reverse translation<br />
Standard<br />
Vertebrate Mitochondrial<br />
Yeast Mitochondrial<br />
Mold, Protozoan and Coelenterate Mitochondrial/Mycoplasma<br />
and Spiroplasma Mitochondrial<br />
Invertebrate Mitochondrial<br />
Ciliate, Dasycladacean and Hexamita Nuclear<br />
Echinoderm and Flatworm Mitochondrial<br />
Euplotid Mitochondrial<br />
Bacterial, Archaeal and Plant Plasmid<br />
NOTE: this option is overridden if a Codon Usage Table is entered<br />
o Codon Usage Table<br />
If not checked, the default Genetic Code selected will be used in reverse<br />
translation. The following options are allowed:<br />
Uniform - for each amino acid in the sequence(s), a random codon<br />
will be chosen from those associated with that amino acid
IUB - for each amino acid in the sequence, its IUB code will be<br />
used<br />
If checked, user will be prompted to enter a Codon Usage Table after the<br />
Reverse Translation button has been pressed, which will be used in<br />
reverse translation. The following additional options are allowed:<br />
Best - for each amino acid in the sequence(s), the codon with the<br />
highest frequency associated with that amino acid (determined<br />
by Codon Usage Table) will be selected<br />
Random Best - for each amino acid in the sequence(s), a random<br />
number will be generated and a codon associated with that amino<br />
acid will be selected relative to its frequency (determined by<br />
Codon Usage Table)<br />
NOTE: Please use the codon usage tables found on<br />
http://www.kazusa.or.jp/codon/ using the option for “a style like<br />
CodonFrequency output in GCG”<br />
DNA Statistics Options<br />
o Window-Based % GC<br />
Length – this value will serve as the sliding window length when<br />
calculating % GC<br />
Step-size – the value will serve as the number of base-pairs the sliding<br />
window shifts each time (useful for long sequences)<br />
o N-Gram<br />
The user-entered value will serve as n (the length of each “gram”)<br />
Only display N-Grams found in the sequence<br />
With this option selected, only N-Grams with a nonzero count will<br />
be displayed in the table.<br />
o If ATAG were selected, with n = 2, only AT, TA, AG will be<br />
displayed in the table rather than AA, AC, AG, AT,…,TT<br />
Include the reverse complement<br />
With this option selected, the counts of N-Grams of the reverse<br />
complement will be added to the counts of the original sequence<br />
o Molecular Weight<br />
Consider DNA sequences double stranded<br />
Include a 5’ triphosphate on RNA sequences<br />
Include a 5’ monophosphate on DNA sequences<br />
o Codon Usage Table<br />
Offset<br />
Refers to the frame (0, +1, +2)<br />
Table Display
White-space-the table will be displayed in the same form as the<br />
codon usage tables from http://www.kazusa.or.jp/codon/<br />
Table-the table will be displayed in a Microsoft Word table<br />
Substring Search Options<br />
o ORF<br />
Minimum codon length – defines the minimum length of an open reading<br />
frame to be reported<br />
o Optimize based on:<br />
Length - when selected, longer ORFs will be ranked higher<br />
CAI and Length – when selected, CAI scores are also taken into<br />
consideration in ranking ORFs<br />
o For Substring and Substring with Gap<br />
Mismatch Threshold – only sequences with less mismatches than the<br />
value entered will be displayed<br />
Site Search<br />
o Ri and Iseq Scoring Options<br />
% GC of Genome– this value will be used to calculate the composition of<br />
the genome sequence that the motif is scored upon<br />
Motif – enter a motif of sequences, all the same length in either Raw or<br />
FASTA format<br />
o Dyad Pattern Search<br />
% GC of Genome – this value will be used to calculate the composition of<br />
the genome sequence that the motif is scored upon<br />
Motif – enter a motif of sequences, all the same length in either Raw or<br />
FASTA format<br />
Threshold - each dyad found must have a score higher than a userdefined<br />
constant times the information content of the motif to be<br />
reported<br />
Gap – defines the range of base-pairs as a spacer between two dyads<br />
For second dyad:<br />
Mirror Motif – when selected, the other half of the dyad will be the<br />
reverse complement of the motif entered<br />
Duplicate Motif – when selected, the other half of the dyad will be<br />
the same as the motif entered<br />
o Consensus Logo<br />
% GC of Genome – this value will be used to calculate the composition of<br />
the genome sequence that the motif is scored upon<br />
Use IUB for equally probably bases – if this option is checked, IUB<br />
characters will be used for two bases that occur at equal frequencies in a<br />
position in the motif. For example, if “A” and “G” are equally likely, “R”<br />
will be used for that position
Motif Discovery<br />
o Gibbs Sampling<br />
% GC of Genome – this value will be used to calculate the composition of<br />
the genome sequence that the motif is scored upon<br />
Window Length – this value determines the length of the window<br />
sampled by Gibbs Sampling<br />
Number of Iterations – the number of times the program should cycle<br />
before reporting results (higher number of iterations means slower, but<br />
more accurate results)<br />
o Greedy Search<br />
% GC of Genome – this value will be used to calculate the composition of<br />
the genome sequence that the motif is scored upon<br />
Window Length – this value determines the length of the window<br />
sampled by Gibbs Sampling<br />
Number of Iterations – the number of times the program should cycle<br />
before reporting results (higher number of iterations means slower, but<br />
more accurate results)<br />
o Dyad Motif Search<br />
Dyad Length – if values 5 ± 1 were entered, dyads of length 4, 5, and 6<br />
will be considered<br />
Spacer Length – if values 5 ± 1 were entered, spacers of length 4, 5, and 6<br />
will be allowed<br />
Mismatch Threshold – determines the maximum number of mismatches<br />
that should be allowed between the two dyads<br />
Palindrome – the dyads must be reverse complements of each other<br />
Direct Repeat – the dyads must be identical<br />
Pair-wise Alignment<br />
o Scoring Options<br />
The first selection allows the user to define constant values to be used to<br />
score matches and mismatches<br />
The second selection allows the user to define more specific values if<br />
they would like to weigh certain mismatches/matches as less or more<br />
important than another<br />
NOTE: if protein sequences are selected, BLOSUM62 will be used to score<br />
the matches and mismatches between the sequences<br />
o GEP – this value defines the gap extension penalty<br />
o GOP – this value defines the gap opening penalty<br />
o Limit results to X alignments – in the case of split paths, the program will report<br />
no more than X sequences
FUNCTIONALITY :<br />
Basic Manipulation<br />
o Convert Sequence To:<br />
Raw – The sequence(s) selected will be printed in the raw format<br />
FASTA – The sequence(s) selected will be printed in FASTA format<br />
A default header (> # Default) will be created if one is not<br />
provided (> # [user input])<br />
GenBank – The sequence(s) selected will be printed in GenBank format<br />
o Reverse<br />
DNA, RNA and protein sequences can be reversed<br />
ATAGTAGAT -> TAGATGATA<br />
o Complement<br />
Only DNA and RNA sequences may be complemented. An error message<br />
will be displayed if a protein sequence is selected. If DNA, RNA, and<br />
protein sequences are selected, an error message will be displayed and<br />
only the DNA and RNA sequence(s) will be complemented. IUB<br />
characters will be ignored.<br />
ATAGTAGAT -> TATCATCTA<br />
o Reverse Complement<br />
Only DNA and RNA sequences may be complemented. An error message<br />
will be displayed if a protein sequence is selected. If DNA, RNA and<br />
protein sequences are selected, an error message will be displayed and<br />
only the DNA and RNA sequence(s) will be reverse complemented. IUB<br />
characters will be ignored.<br />
ATAGTAGAT -> ATCTACTAT<br />
Translation<br />
o Forward (Translation)<br />
Only DNA sequences may be translated. An error message will be<br />
displayed if a protein sequence is selected. If DNA, RNA and protein<br />
sequences are selected, an error message will be displayed and only the<br />
DNA sequence(s) will be translated.<br />
If IUB is selected:<br />
Due to possible duplicities with IUB code, a list of all codons<br />
corresponding to that IUB code will be generated and a random<br />
one will be chosen to be used in translation<br />
o Example:<br />
GCode = Standard:<br />
TGB -> [TGC, TGT, TGG] -> C or W
If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D,<br />
N) is found, an error message will be displayed and translation<br />
will not occur<br />
Translation will be performed according to the default Genetic Code<br />
selected and according to the offset specified in Translation Options<br />
NOTE: additional bases will be removed<br />
Examples: (italicized characters are ignored)<br />
Offset = 0, GCode = Standard: ATAGTAGAT -> IVD<br />
Offset =1, GCode = Standard: ATAGTAGAT -> **<br />
o Reverse (Translation)<br />
Only protein sequences may be reverse translated. An error message<br />
will be displayed if a DNA or RNA sequence is selected. If DNA, RNA and<br />
protein sequences are selected, an error message will be displayed and<br />
only the protein sequence(s) will be reverse translated.<br />
Reverse Translation will be performed according to the method specified<br />
in the Reverse Translation/Translation Options<br />
If the user provides a Codon Usage Table, the reverse translation<br />
will occur according to the Codon Usage Table rather than the<br />
default Genetic Code<br />
Examples:<br />
o GCode = Standard: IVD -> ATAGTAGAT<br />
o (Translation) Map<br />
Only DNA sequences can be used to create a translation map. An error<br />
message will be displayed if a protein or RNA sequence is selected. If<br />
DNA, RNA, and protein sequences are selected, an error message will be<br />
displayed and translation maps will only be displayed for the DNA<br />
sequence(s). The sequence(s) is (are) translated for each of the frames<br />
and all of the resultant sequences are displayed together.<br />
If IUB is selected:<br />
Due to possible duplicities with IUB code, a list of all codons<br />
corresponding to that IUB code will be generated and a random<br />
one will be chosen to be used in translation<br />
o Example:<br />
GCode = Standard:<br />
TGB -> [TGC, TGT, TGG] -> C or W<br />
If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D,<br />
N) is found, an error message will be displayed and translation<br />
will not occur<br />
DNA Sequence Statistics<br />
o Global %GC
Only DNA sequences can be used.. An error message will be displayed if<br />
a protein or RNA sequence is selected. If both DNA and protein<br />
sequences are selected, an error message will be displayed and %GC will<br />
only be calculated for the DNA sequence(s).<br />
Calculates the %GC content for the entire sequence and displays it in a<br />
table.<br />
o Window %GC<br />
Only DNA sequences can be used.. An error message will be displayed if<br />
a protein or RNA sequence is selected. If both DNA and protein<br />
sequences are selected, an error message will be displayed and %GC will<br />
only be calculated for the DNA sequence(s).<br />
Calculates the %GC content for windows of length defined by user in the<br />
Sequence Statistics Options and displays them in a table, indexed by<br />
position.<br />
o Nucleotide Frequencies<br />
Only DNA sequences can be used. An error message will be displayed if a<br />
protein or RNA sequence is selected. If both DNA and protein sequences<br />
are selected, an error message will be displayed and nucleotide<br />
frequencies will only be calculated for the DNA sequence(s).<br />
Calculates frequency of each nucleotide for the entire sequence and<br />
displays it in a table.<br />
o N-Gram<br />
Only DNA sequences can be used. An error message will be displayed if<br />
a protein or RNA sequence is selected. If sequences of all types are<br />
selected, the N-Grams will only be calculated for the DNA sequence(s).<br />
IUB characters will be ignored<br />
Takes the value of N from Sequence Statistics Options and generates a list<br />
of all possible “words” of length N (from the characters A, T, C, G) and<br />
their counts within the sequence.<br />
See Sequence Statistics Options for other options relating to N-Grams<br />
o Codon Usage Table<br />
Only DNA sequences can be used. An error message will be displayed if<br />
a protein or RNA is selected. If sequences of all types are selected, the<br />
codon usage table will only be generated for the DNA sequence(s). IUB<br />
characters will be ignored<br />
Generates a codon usage table (using the offset from Sequence Statistics<br />
Options) using the sequence(s) that are selected.<br />
See Sequence Statistics Options for options relating to table display<br />
o Calculate Molecular Weight<br />
Open to DNA, RNA and protein sequences.
See Sequence Statistics Options for options relating to Molecular Weight<br />
Protein Sequence Statistics<br />
o GRAVY<br />
GRAVY can only be calculated for protein sequences. An error message<br />
will be displayed if a DNA or RNA sequence is selected. If sequences of all<br />
types are selected, GRAVY will only be calculated for protein<br />
sequence(s).<br />
GRAVY is the average of the hydropathicites of the amino acids.<br />
o Isoelectric Point<br />
Isoelectric points can only be calculated for protein sequences. An error<br />
message will be displayed if a DNA or RNA sequence is selected. If<br />
sequences of all types are selected, the isoelectric point will only be<br />
generated for protein sequence(s).<br />
The isoelectric point is the pH at which the protein carries a zero net<br />
electric charge.<br />
o Generate Protein Statistics<br />
Protein statistics can only be calculated for protein sequences. An error<br />
message will be displayed if a DNA or RNA sequence is selected. If<br />
sequences of all types are selected, the protein statistics will only be<br />
generated for the strict DNA sequence(s).<br />
Protein statistics provides a count and percentage of each of the amino<br />
acids for each sequence.<br />
Substring Search<br />
o Find ORFs (Open Reading Frames)<br />
ORFs can only be found for DNA sequences. An error message will be<br />
displayed if a protein or RNA sequence is selected. If sequences of all<br />
types are selected, the ORFs will only be found for the DNA sequence(s).<br />
ATG, TTG, CTG and GTG are all considered start codons<br />
Different Output Locations:<br />
BELOW – Only prints the top-scoring ORF<br />
REPLACE – Highlights the top-scoring ORF<br />
CLIPBOARD – Saves all discovered ORFs<br />
NEW DOC – Prints all discovered ORFs<br />
See Motif Discovery Options for options relating to ORFs<br />
o Find Substring<br />
Substring searches can only be completed for DNA sequences. An error<br />
message will be displayed if a protein or RNA sequence is selected. If<br />
sequences of all types are selected, the substring search will only be<br />
completed for the DNA sequence(s).
IUB characters are considered fractional mismatches depending on<br />
which DNA bases they stand for. For example, B is considered to be 1/3<br />
T, 1/3 G and 1/3 C.<br />
Find Substring with Gap allows the user to search for two substrings<br />
separated by a gap of defined length.<br />
Different Output Locations<br />
BELOW, NEW DOC and CLIPBOARD<br />
o Returns all results<br />
REPLACE<br />
o Highlights the matches in the sequence with a darker color<br />
representing a better scoring sequence<br />
Site Search<br />
o Search (Ri sequence and I sequence)<br />
These Site Search mechanisms can only be completed for DNA sequences.<br />
An error message will be displayed if a protein or RNA sequence is<br />
selected. If sequences of all types are selected, the site search will only<br />
be completed for the DNA sequence(s).<br />
In the motif, only DNA sequences may be entered.<br />
Reports the Ri Score or Iseq Score<br />
o Dyad Pattern Search<br />
Dyad Pattern Search can only be completed for DNA sequences. An error<br />
message will be displayed if a protein or RNA sequence is selected. If<br />
sequences of all types are selected, the site search will only be completed<br />
for the DNA sequence(s).<br />
In the motif, only DNA sequences may be entered.<br />
Different Output Locations:<br />
BELOW, NEW DOC and CLIPBOARD<br />
o Returns all results<br />
REPLACE<br />
o Highlights the dyads in the sequence with a darker color<br />
representing a better scoring sequence<br />
Motif Discovery<br />
o Gibbs Sampling and Greedy Search<br />
Gibbs Sampling can only be completed for DNA sequences. An error<br />
message will be displayed if a protein or RNA sequence is selected. If<br />
sequences of all types are selected, the sampling will only be completed<br />
for the DNA sequence(s).<br />
The program will complete the number of iterations requested 10 times<br />
and return the motif that has the greatest IC content<br />
Different Output Locations:<br />
BELOW, NEW DOC and CLIPBOARD
o Returns the best result<br />
REPLACE<br />
o Highlights the motif in each sequence<br />
o Dyad Motif Search<br />
The Dyad Motif Search can only be completed for DNA sequences. An<br />
error message will be displayed if a protein or RNA sequence is selected.<br />
If sequences of all types are selected, the motif search will only be<br />
completed for the DNA sequence(s).<br />
o Consensus Logo<br />
The consensus logo can only be generated with a motif of DNA sequences<br />
The motif to generate a pseudo-consensus logo can be selected in the<br />
document or entered into a pop-up input box when the button is clicked.<br />
Information Content will be calculated using R sequence or Relative<br />
Entropy, depending on which is selected in the Advanced Options<br />
The text is likely to be small, and can be scaled up or down using the<br />
arrows in the Resources Group (though resolution may be lost)<br />
Pair-Wise Alignment<br />
o Needleman-Wuncsh and Smith-Watermann<br />
Pair-wise alignment can be performed with both DNA and Protein<br />
sequences.<br />
Protein sequences are scored using the BLOSUM62 matrix and DNA<br />
sequences are scored using user-defined values.