BIOBAR : USER MANUAL

BIOBAR : USER MANUAL 

INSTALLING: 

AUTOMATIC INSTALLATION 

o Make sure Biobar.dotm is open and double-click the “Click to Install Biobar” 

button in the document 

MANUAL INSTALLATION 

o The Biobar.dotm file needs to be placed in the STARTUP folder for Microsoft 

Word so that the toolbar is available for every document. 

XP - 

C:\Documents and Settings\\Application 

Data\Microsoft\Word\Startup 

Vista – 

C:\Users\\AppData\Roaming\Microsoft\Word\STARTUP 

SELECTION: 

OPTIONS: 

Select the sequence(s) that you wish to perform an operation on 

You should have two enters between sequences 

o NOTE: if you are entering sequences in FASTA format (with a “>” denoting the 

header), the “>” will be used to separate sequences, NOT two enters 

Basic Options 

o Default Output 

Raw - your results will be displayed in the raw format 

FASTA - your results will be displayed in the FASTA format 

if a header was not provided, a numbered default one will be 

created 

GenBank – your results will be displayed in the GenBank format 

o Default Output Location 

Append to Document – your results will be printed beneath the selection 

Create New Document – your results will be printed on a new Document 

Save to Clipboard – your results will be saved to the Clipboard; use Paste 

(Ctrl + V) to retrieve them 

Replace Selection – your results will overwrite the selection 

Advanced Options (Pop-up)

o % ATCG for DNA 

This represents the percentage of A’s, T’s, C’s and G’s that must be in a 

sequence for it to be considered DNA 

NOTE: the percentage must be entered as an integer, so enter 70 for 

70%, NOT 0.7 

o Allow IUB Code 

If checked, the following IUB characters are allowed in DNA sequences: 

A, T, C, G, R, Y, M, K, S, W, H, B, V, D, N 

Characters other than the above are filtered out 

If not checked, the following characters are allowed in DNA sequences 

A, T, C, G 

Whether a sequence is DNA or Protein is determined by % ATCG 

Characters other than A, T, C or G are filtered out 

o If “ATGAAMGA” were selected, “ATGAAGA” would be 

processed 

o Pseudocounts 

LaPlace: pseudocounts for the PSFM calculated using LaPlace’s method 

10^-50: pseudocounts intended only to avoid 0 values for log 

calculations 

Translation Options 

o Offset 

Refers to the frame of translation (0, +1, +2) 

o Genetic Code (GCode) 

Denotes the genetic code used in translation and reverse translation 

Standard 

Vertebrate Mitochondrial 

Yeast Mitochondrial 

Mold, Protozoan and Coelenterate Mitochondrial/Mycoplasma 

and Spiroplasma Mitochondrial 

Invertebrate Mitochondrial 

Ciliate, Dasycladacean and Hexamita Nuclear 

Echinoderm and Flatworm Mitochondrial 

Euplotid Mitochondrial 

Bacterial, Archaeal and Plant Plasmid 

NOTE: this option is overridden if a Codon Usage Table is entered 

o Codon Usage Table 

If not checked, the default Genetic Code selected will be used in reverse 

translation. The following options are allowed: 

Uniform - for each amino acid in the sequence(s), a random codon 

will be chosen from those associated with that amino acid

IUB - for each amino acid in the sequence, its IUB code will be 

used 

If checked, user will be prompted to enter a Codon Usage Table after the 

Reverse Translation button has been pressed, which will be used in 

reverse translation. The following additional options are allowed: 

Best - for each amino acid in the sequence(s), the codon with the 

highest frequency associated with that amino acid (determined 

by Codon Usage Table) will be selected 

Random Best - for each amino acid in the sequence(s), a random 

number will be generated and a codon associated with that amino 

acid will be selected relative to its frequency (determined by 

Codon Usage Table) 

NOTE: Please use the codon usage tables found on 

http://www.kazusa.or.jp/codon/ using the option for “a style like 

CodonFrequency output in GCG” 

DNA Statistics Options 

o Window-Based % GC 

Length – this value will serve as the sliding window length when 

calculating % GC 

Step-size – the value will serve as the number of base-pairs the sliding 

window shifts each time (useful for long sequences) 

o N-Gram 

The user-entered value will serve as n (the length of each “gram”) 

Only display N-Grams found in the sequence 

With this option selected, only N-Grams with a nonzero count will 

be displayed in the table. 

o If ATAG were selected, with n = 2, only AT, TA, AG will be 

displayed in the table rather than AA, AC, AG, AT,…,TT 

Include the reverse complement 

With this option selected, the counts of N-Grams of the reverse 

complement will be added to the counts of the original sequence 

o Molecular Weight 

Consider DNA sequences double stranded 

Include a 5’ triphosphate on RNA sequences 

Include a 5’ monophosphate on DNA sequences 


Offset 

Refers to the frame (0, +1, +2) 

Table Display

White-space-the table will be displayed in the same form as the 

codon usage tables from http://www.kazusa.or.jp/codon/ 

Table-the table will be displayed in a Microsoft Word table 

Substring Search Options 

o ORF 

Minimum codon length – defines the minimum length of an open reading 

frame to be reported 

o Optimize based on: 

Length - when selected, longer ORFs will be ranked higher 

CAI and Length – when selected, CAI scores are also taken into 

consideration in ranking ORFs 

o For Substring and Substring with Gap 

Mismatch Threshold – only sequences with less mismatches than the 

value entered will be displayed 

Site Search 

o Ri and Iseq Scoring Options 

% GC of Genome– this value will be used to calculate the composition of 

the genome sequence that the motif is scored upon 

Motif – enter a motif of sequences, all the same length in either Raw or 

FASTA format 

o Dyad Pattern Search 

% GC of Genome – this value will be used to calculate the composition of 


Motif – enter a motif of sequences, all the same length in either Raw or 

FASTA format 

Threshold - each dyad found must have a score higher than a userdefined 

constant times the information content of the motif to be 

reported 

Gap – defines the range of base-pairs as a spacer between two dyads 

For second dyad: 

Mirror Motif – when selected, the other half of the dyad will be the 

reverse complement of the motif entered 

Duplicate Motif – when selected, the other half of the dyad will be 

the same as the motif entered 

o Consensus Logo 



Use IUB for equally probably bases – if this option is checked, IUB 

characters will be used for two bases that occur at equal frequencies in a 

position in the motif. For example, if “A” and “G” are equally likely, “R” 

will be used for that position

Motif Discovery 

o Gibbs Sampling 



Window Length – this value determines the length of the window 

sampled by Gibbs Sampling 

Number of Iterations – the number of times the program should cycle 

before reporting results (higher number of iterations means slower, but 

more accurate results) 

o Greedy Search 



Window Length – this value determines the length of the window 

sampled by Gibbs Sampling 

Number of Iterations – the number of times the program should cycle 

before reporting results (higher number of iterations means slower, but 

more accurate results) 

o Dyad Motif Search 

Dyad Length – if values 5 ± 1 were entered, dyads of length 4, 5, and 6 

will be considered 

Spacer Length – if values 5 ± 1 were entered, spacers of length 4, 5, and 6 

will be allowed 

Mismatch Threshold – determines the maximum number of mismatches 

that should be allowed between the two dyads 

Palindrome – the dyads must be reverse complements of each other 

Direct Repeat – the dyads must be identical 

Pair-wise Alignment 

o Scoring Options 

The first selection allows the user to define constant values to be used to 

score matches and mismatches 

The second selection allows the user to define more specific values if 

they would like to weigh certain mismatches/matches as less or more 

important than another 

NOTE: if protein sequences are selected, BLOSUM62 will be used to score 

the matches and mismatches between the sequences 

o GEP – this value defines the gap extension penalty 

o GOP – this value defines the gap opening penalty 

o Limit results to X alignments – in the case of split paths, the program will report 

no more than X sequences

FUNCTIONALITY : 

Basic Manipulation 

o Convert Sequence To: 

Raw – The sequence(s) selected will be printed in the raw format 

FASTA – The sequence(s) selected will be printed in FASTA format 

A default header (> # Default) will be created if one is not 

provided (> # [user input]) 

GenBank – The sequence(s) selected will be printed in GenBank format 

o Reverse 

DNA, RNA and protein sequences can be reversed 

ATAGTAGAT -> TAGATGATA 

o Complement 

Only DNA and RNA sequences may be complemented. An error message 

will be displayed if a protein sequence is selected. If DNA, RNA, and 

protein sequences are selected, an error message will be displayed and 

only the DNA and RNA sequence(s) will be complemented. IUB 

characters will be ignored. 

ATAGTAGAT -> TATCATCTA 

o Reverse Complement 

Only DNA and RNA sequences may be complemented. An error message 

will be displayed if a protein sequence is selected. If DNA, RNA and 


only the DNA and RNA sequence(s) will be reverse complemented. IUB 

characters will be ignored. 

ATAGTAGAT -> ATCTACTAT 

Translation 

o Forward (Translation) 

Only DNA sequences may be translated. An error message will be 

displayed if a protein sequence is selected. If DNA, RNA and protein 

sequences are selected, an error message will be displayed and only the 

DNA sequence(s) will be translated. 

If IUB is selected: 

Due to possible duplicities with IUB code, a list of all codons 

corresponding to that IUB code will be generated and a random 

one will be chosen to be used in translation 

o Example: 

GCode = Standard: 

TGB -> [TGC, TGT, TGG] -> C or W

If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D, 

N) is found, an error message will be displayed and translation 

will not occur 

Translation will be performed according to the default Genetic Code 

selected and according to the offset specified in Translation Options 

NOTE: additional bases will be removed 

Examples: (italicized characters are ignored) 

Offset = 0, GCode = Standard: ATAGTAGAT -> IVD 

Offset =1, GCode = Standard: ATAGTAGAT -> ** 

o Reverse (Translation) 

Only protein sequences may be reverse translated. An error message 

will be displayed if a DNA or RNA sequence is selected. If DNA, RNA and 


only the protein sequence(s) will be reverse translated. 

Reverse Translation will be performed according to the method specified 

in the Reverse Translation/Translation Options 

If the user provides a Codon Usage Table, the reverse translation 

will occur according to the Codon Usage Table rather than the 

default Genetic Code 

Examples: 

o GCode = Standard: IVD -> ATAGTAGAT 

o (Translation) Map 

Only DNA sequences can be used to create a translation map. An error 

message will be displayed if a protein or RNA sequence is selected. If 

DNA, RNA, and protein sequences are selected, an error message will be 

displayed and translation maps will only be displayed for the DNA 

sequence(s). The sequence(s) is (are) translated for each of the frames 

and all of the resultant sequences are displayed together. 

If IUB is selected: 

Due to possible duplicities with IUB code, a list of all codons 

corresponding to that IUB code will be generated and a random 

one will be chosen to be used in translation 

o Example: 

GCode = Standard: 

TGB -> [TGC, TGT, TGG] -> C or W 

If an incorrect character (not A, T, C, G, R, Y, M, K, S, W, H, B, V, D, 

N) is found, an error message will be displayed and translation 

will not occur 

DNA Sequence Statistics 

o Global %GC

Only DNA sequences can be used.. An error message will be displayed if 

a protein or RNA sequence is selected. If both DNA and protein 

sequences are selected, an error message will be displayed and %GC will 

only be calculated for the DNA sequence(s). 

Calculates the %GC content for the entire sequence and displays it in a 

table. 

o Window %GC 

Only DNA sequences can be used.. An error message will be displayed if 

a protein or RNA sequence is selected. If both DNA and protein 

sequences are selected, an error message will be displayed and %GC will 

only be calculated for the DNA sequence(s). 

Calculates the %GC content for windows of length defined by user in the 

Sequence Statistics Options and displays them in a table, indexed by 

position. 

o Nucleotide Frequencies 

Only DNA sequences can be used. An error message will be displayed if a 

protein or RNA sequence is selected. If both DNA and protein sequences 

are selected, an error message will be displayed and nucleotide 

frequencies will only be calculated for the DNA sequence(s). 

Calculates frequency of each nucleotide for the entire sequence and 

displays it in a table. 

o N-Gram 

Only DNA sequences can be used. An error message will be displayed if 

a protein or RNA sequence is selected. If sequences of all types are 

selected, the N-Grams will only be calculated for the DNA sequence(s). 

IUB characters will be ignored 

Takes the value of N from Sequence Statistics Options and generates a list 

of all possible “words” of length N (from the characters A, T, C, G) and 

their counts within the sequence. 

See Sequence Statistics Options for other options relating to N-Grams 


Only DNA sequences can be used. An error message will be displayed if 

a protein or RNA is selected. If sequences of all types are selected, the 

codon usage table will only be generated for the DNA sequence(s). IUB 

characters will be ignored 

Generates a codon usage table (using the offset from Sequence Statistics 

Options) using the sequence(s) that are selected. 

See Sequence Statistics Options for options relating to table display 

o Calculate Molecular Weight 

Open to DNA, RNA and protein sequences.

See Sequence Statistics Options for options relating to Molecular Weight 

Protein Sequence Statistics 

o GRAVY 

GRAVY can only be calculated for protein sequences. An error message 

will be displayed if a DNA or RNA sequence is selected. If sequences of all 

types are selected, GRAVY will only be calculated for protein 

sequence(s). 

GRAVY is the average of the hydropathicites of the amino acids. 

o Isoelectric Point 

Isoelectric points can only be calculated for protein sequences. An error 

message will be displayed if a DNA or RNA sequence is selected. If 

sequences of all types are selected, the isoelectric point will only be 

generated for protein sequence(s). 

The isoelectric point is the pH at which the protein carries a zero net 

electric charge. 

o Generate Protein Statistics 

Protein statistics can only be calculated for protein sequences. An error 

message will be displayed if a DNA or RNA sequence is selected. If 

sequences of all types are selected, the protein statistics will only be 

generated for the strict DNA sequence(s). 

Protein statistics provides a count and percentage of each of the amino 

acids for each sequence. 

Substring Search 

o Find ORFs (Open Reading Frames) 

ORFs can only be found for DNA sequences. An error message will be 

displayed if a protein or RNA sequence is selected. If sequences of all 

types are selected, the ORFs will only be found for the DNA sequence(s). 

ATG, TTG, CTG and GTG are all considered start codons 

Different Output Locations: 

BELOW – Only prints the top-scoring ORF 

REPLACE – Highlights the top-scoring ORF 

CLIPBOARD – Saves all discovered ORFs 

NEW DOC – Prints all discovered ORFs 

See Motif Discovery Options for options relating to ORFs 

o Find Substring 

Substring searches can only be completed for DNA sequences. An error 


sequences of all types are selected, the substring search will only be 

completed for the DNA sequence(s).

IUB characters are considered fractional mismatches depending on 

which DNA bases they stand for. For example, B is considered to be 1/3 

T, 1/3 G and 1/3 C. 

Find Substring with Gap allows the user to search for two substrings 

separated by a gap of defined length. 

Different Output Locations 

BELOW, NEW DOC and CLIPBOARD 

o Returns all results 

REPLACE 

o Highlights the matches in the sequence with a darker color 

representing a better scoring sequence 

Site Search 

o Search (Ri sequence and I sequence) 

These Site Search mechanisms can only be completed for DNA sequences. 

An error message will be displayed if a protein or RNA sequence is 

selected. If sequences of all types are selected, the site search will only 

be completed for the DNA sequence(s). 

In the motif, only DNA sequences may be entered. 

Reports the Ri Score or Iseq Score 

o Dyad Pattern Search 

Dyad Pattern Search can only be completed for DNA sequences. An error 


sequences of all types are selected, the site search will only be completed 

for the DNA sequence(s). 

In the motif, only DNA sequences may be entered. 


BELOW, NEW DOC and CLIPBOARD 

o Returns all results 

REPLACE 

o Highlights the dyads in the sequence with a darker color 

representing a better scoring sequence 

Motif Discovery 

o Gibbs Sampling and Greedy Search 

Gibbs Sampling can only be completed for DNA sequences. An error 


sequences of all types are selected, the sampling will only be completed 

for the DNA sequence(s). 

The program will complete the number of iterations requested 10 times 

and return the motif that has the greatest IC content 


BELOW, NEW DOC and CLIPBOARD

o Returns the best result 

REPLACE 

o Highlights the motif in each sequence 

o Dyad Motif Search 

The Dyad Motif Search can only be completed for DNA sequences. An 

error message will be displayed if a protein or RNA sequence is selected. 

If sequences of all types are selected, the motif search will only be 

completed for the DNA sequence(s). 

o Consensus Logo 

The consensus logo can only be generated with a motif of DNA sequences 

The motif to generate a pseudo-consensus logo can be selected in the 

document or entered into a pop-up input box when the button is clicked. 

Information Content will be calculated using R sequence or Relative 

Entropy, depending on which is selected in the Advanced Options 

The text is likely to be small, and can be scaled up or down using the 

arrows in the Resources Group (though resolution may be lost) 

Pair-Wise Alignment 

o Needleman-Wuncsh and Smith-Watermann 

Pair-wise alignment can be performed with both DNA and Protein 

sequences. 

Protein sequences are scored using the BLOSUM62 matrix and DNA 

sequences are scored using user-defined values.

BIOBAR : USER MANUAL

Create successful ePaper yourself

Delete template?

Save as template?