KNIME Bioinformatics Extensions
KNIME Bioinformatics Extensions
KNIME Bioinformatics Extensions
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>KNIME</strong> <strong>Bioinformatics</strong> <strong>Extensions</strong><br />
Karol Kozak<br />
ETH Zurich<br />
January 2011, Zurich, <strong>KNIME</strong> SIG<br />
your name
Start HCS<br />
your name
HCS Experiments and<br />
Informatics<br />
your name
HCS & Open Source<br />
<strong>KNIME</strong><br />
LIMS<br />
<strong>KNIME</strong> Matlab<br />
WEKA Nodes<br />
Cell Classifiers<br />
HCDB –<br />
OpenBIS<br />
&<br />
Library<br />
Database<br />
<strong>Bioinformatics</strong><br />
Off-Target<br />
<strong>KNIME</strong>, Java<br />
Spotfire<br />
<strong>KNIME</strong><br />
<strong>KNIME</strong><br />
Matlab<br />
your name
Research interest<br />
Role<br />
Software<br />
engineering +<br />
modeling<br />
Database<br />
understanding<br />
RNA technology<br />
Classification<br />
Pattern<br />
recognition<br />
Annotation<br />
database<br />
Off-target<br />
prediction<br />
2D/3D Structure<br />
relationships<br />
Gene<br />
functionality<br />
Role of <strong>Bioinformatics</strong><br />
in RNAi technology to detect functionality of<br />
genes in mammalian cells<br />
-New prediction algorithms including Kernel methods<br />
-Post analysis of existing hits<br />
-Traditional bioinformatics: homology, blast, alignment<br />
score<br />
-Database development<br />
your name
RNAi Libraries<br />
- Genome wide<br />
- Functional groups (Kinases..)<br />
- n - Ologinucleotides<br />
- Pooled<br />
your name
RNAi Libraries<br />
- Qiagen<br />
- Thermo Fisher Scientific (Dharmacon)<br />
- Applied Biosystems (Ambion)<br />
- Sigma-Aldrich (esiRNA)<br />
your name
TIME<br />
RNAi Library evlolution<br />
Purchase<br />
Sigma<br />
esiRNA<br />
annot<br />
database<br />
V1<br />
AppBiosyst<br />
annot<br />
database<br />
Qiagen<br />
annot<br />
database<br />
TFisher<br />
Annot<br />
database<br />
V2<br />
Dharmacon<br />
annot<br />
database<br />
Today<br />
V3<br />
Dharmacon<br />
annot<br />
database<br />
Less genes<br />
More known oligo<br />
New design<br />
Transcripts analysis<br />
Off-target information<br />
your name
Annotation of human genome &<br />
Reliability of siRNA libraries<br />
Qiagen genome wide siRNA library<br />
(HsDgV3 (Human Druggable Genome siRNA Set V3); HsNmV1 (Human Refseq Xm siRNA Set V1); HsXmV1 (Human Predicted<br />
genome Set V1))<br />
2006: 22’832 genes / 90’728 siRNAs<br />
2010: 16’199 genes<br />
Meier Roger<br />
16.5% %<br />
12.5%<br />
12.5 %<br />
71 %<br />
71%<br />
41%<br />
% 39% %<br />
20%<br />
%<br />
0.11 %<br />
0.23 %<br />
10.3%<br />
25.6% %<br />
0.07 %<br />
33% %<br />
30.6% %<br />
0.01 %<br />
0.01 %<br />
% of genes<br />
target by:<br />
1 siRNA<br />
2 siRNA<br />
3 siRNA<br />
4 siRNA<br />
5 siRNA<br />
6 siRNA<br />
7 siRNA<br />
8 siRNA<br />
9 siRNA<br />
target genes with off target(s)<br />
wrong predicted genes<br />
target genes<br />
siRNAs with off target(s)<br />
siRNAs against wrong predicted genes<br />
siRNAs w/o off target(s)<br />
your name
Based on GENEID<br />
your name
Annotated off-target by companies<br />
How did companies select these off targets siRNA <br />
- There are eliminated a lot of Ribosomal siRNA<br />
- There are eliminated a lot of siRNA against "membrane"<br />
proteins (2061 in old list, 945 in clean list)<br />
- did not eliminate many siRNA against "kinase" proteins<br />
(2624 in old list, 1605 in list)<br />
- besides the 2800 proteins that there are discarded slightly<br />
enriched in virus related pathways<br />
your name
Library reduction<br />
• siRNA without geneID<br />
• The web site shows the gene that<br />
the siRNA matched at the time it<br />
was selected. The database table<br />
from which we get current<br />
annotation shows what the siRNA<br />
matches in the current Refseq.<br />
your name
Library handling<br />
From pooled 2 Oligo based<br />
Library analysis<br />
your name
<strong>Bioinformatics</strong> - RNAi<br />
-Off-target effect<br />
UUGCCGUACAGGAUGGACGtg<br />
UUAACUGAUGUUCCAAUCCtg<br />
your name
Sandra Kaestner<br />
<strong>Bioinformatics</strong><br />
your name
<strong>Bioinformatics</strong><br />
your name
Workflow<br />
your name
Project Ribosome biogenesis<br />
translation of ribosomal proteins<br />
maturation and export of<br />
ribosomal mRNAs<br />
80S ribosome<br />
ribosomal proteins<br />
80S ribosome<br />
Pol II<br />
mRNA<br />
transcription of<br />
ribosomal mRNAs<br />
90S particle<br />
pre-40S particle<br />
pre-60S particle<br />
maturation and export<br />
of pre-40S particle<br />
mature 40S subunit<br />
final<br />
maturation<br />
mature 60S subunit<br />
Pol I<br />
rDNA<br />
35S rRNA<br />
Pol III<br />
5S rRNA<br />
maturation and<br />
export of pre-60Sparticle<br />
transcription of ribosomal RNAs<br />
nucleolus nucleoplasm cytoplasm<br />
large ribosomal<br />
proteins<br />
small ribosomal<br />
proteins<br />
trans-acting<br />
factors<br />
18S 5.8S 25S<br />
18S<br />
18S<br />
18S<br />
18S rRNA maturation<br />
35S rRNA<br />
23S rRNA<br />
20S rRNA<br />
20S rRNA<br />
18S rRNA<br />
your name
The Rps2-YFP read out<br />
80S ribosome<br />
ribosomal proteins<br />
Thomas Wild<br />
80S ribosome<br />
mature 40S subunit<br />
mature 60S subunit<br />
Nucleolus Nucleoplasm Cytoplasm<br />
your name
The Rps2-YFP read out<br />
80S ribosome<br />
ribosomal proteins<br />
80S ribosome<br />
mature 40S subunit<br />
mature 60S subunit<br />
Nucleolus Nucleoplasm Cytoplasm<br />
your name
The Rps2-YFP read out<br />
80S ribosome<br />
ribosomal proteins<br />
80S ribosome<br />
mature 40S subunit<br />
mature 60S subunit<br />
Nucleolus Nucleoplasm Cytoplasm<br />
your name
The Rps2-YFP read out<br />
80S ribosome<br />
ribosomal proteins<br />
80S ribosome<br />
mature 40S subunit<br />
mature 60S subunit<br />
Nucleolus Nucleoplasm Cytoplasm<br />
your name
Biogenesis<br />
siRNA 84062- DTNBP1<br />
AACCTTCAAAGCTGAACTAGA<br />
DTNBP1<br />
HCDC<br />
RPS3<br />
Transcript NM_001005<br />
your name
Results<br />
Biogenesis<br />
Results<br />
RNAi<br />
Library<br />
Eg. 4 oligo<br />
RNAi<br />
Library<br />
+ Results =<br />
Hit LIST<br />
Hit<br />
Wonder<br />
Real<br />
Hit LIST<br />
Off-target<br />
in<br />
HIT LIST<br />
Results<br />
Off-target<br />
in<br />
HIT LIST<br />
Potential<br />
Off-targets<br />
your name
Analysis Off targets<br />
Known Off-targets<br />
Build model<br />
your name
AllStars MVP_si03 MVP_si06<br />
Uncoating<br />
Acidification<br />
AllStars si03 si06<br />
MVP<br />
Tubulin<br />
your name
Virus screen<br />
1 out of 3<br />
siRNA<br />
TGGGCCTGAGATGCAGGTAAA<br />
MVP NM_003010<br />
HCDC<br />
MAP2K4<br />
6416|NM_003010|2890|AS<br />
Qiagen off-target SM<br />
MAP2K4<br />
your name
mRNA 2D variants<br />
mRNA<br />
siRNA<br />
- 2D structure relation to Off-target effects<br />
- We can model 2D structures quite robust (Metaserver, Python)<br />
- We can predict potential-target effects<br />
- We must find relation<br />
your name
mRNA<br />
We want to identify structural motifs in a set of<br />
mRNA sequences<br />
your name
mRNA 3D Structure<br />
Nature Movie<br />
Nature Movie<br />
- Dicer, tRNA,<br />
- Off-target relations RNA-RNA i RNP-RNA<br />
- Model how RISC is related to microRNA/siRNA and how it finds own<br />
target and bind to him<br />
your name
RNA 3D<br />
your name
your name
Workflow<br />
ModeRNA<br />
Python<br />
BioPython for parsing structural data from the PDB format<br />
your name
Database architecture (LIMS)<br />
HCS and databases<br />
Library and<br />
annotation<br />
Screening<br />
experiment<br />
Sample, Results,<br />
Management, View<br />
Done<br />
But need maintenance<br />
Public database<br />
Phenotypic data<br />
your name
OpenBis - Open Source<br />
database<br />
your name
Screen DB<br />
Open Source database<br />
HCDB - OpenBIS<br />
your name
Open Source database<br />
OpenBIS (ETH)<br />
-web client<br />
-Command line client<br />
-Java technology, GWT Google<br />
your name
Screen DB<br />
Open Source database<br />
HCDB - OpenBIS<br />
your name
Screen DB<br />
Open Source database<br />
HCDB - OpenBIS<br />
Adam Srebniak<br />
your name
Screen DB<br />
Open Source database<br />
HCDB - OpenBIS<br />
your name
Image Processing and<br />
<strong>KNIME</strong><br />
your name<br />
Adam Srebniak
Open Source<br />
<strong>KNIME</strong><br />
WEB<br />
DESKTOP<br />
your name
Dharmaco<br />
nannot<br />
database<br />
Qiagen<br />
annot<br />
database<br />
Ambion<br />
Annot<br />
database<br />
Library<br />
files<br />
Oligo dharmacon<br />
With target gene<br />
Oligo qiagen<br />
With target gene<br />
Oligo ambion<br />
With target gene<br />
External<br />
databases<br />
www<br />
Gene<br />
Cross reference database for gene/oligo annotation<br />
+<br />
Workflow off-target prediction<br />
OpenBis<br />
your name
Image Processing<br />
One of the first Open Source: CellProfiler<br />
your name
High Content<br />
Image Processing (HCIP)<br />
your name
High Content<br />
Image Processing (HCIP)<br />
your name
Teaching module<br />
Slawek Mazur<br />
your name
HCDC-HITS<br />
Bio-Formats developed by OME software (Jason Swedlow), UW-Madison your name<br />
LOCI and Glencoe Software.
Gabor Bakos<br />
HCDC-HITS<br />
your name
Gabor Bakos<br />
HCDC-HITS<br />
your name
HCDC-HITS<br />
your name
HCDC-HITS<br />
your name
Visualization Improvement<br />
Lukasz Zwolinski, ETH – PWR Student<br />
your name
Acknowledgement<br />
Bioquant Heidelberg and ETH<br />
Holger Erfle, Karol Kozak, Berend Rind<br />
Juergen Reymann, Gabor Csucs, Adam Srebniak<br />
Slawek Mazur, Sandra Kaestner<br />
Welcome to join<br />
TU Konstanz<br />
Dorit Merhof<br />
Trinity College IR<br />
Anthony Davies<br />
MPI-IB, Berlin DE<br />
Peter Braun<br />
Andre Maeurer<br />
TU Breslau:<br />
Karol Kozak<br />
Lukasz Miroslaw<br />
your name