19.03.2015 Views

DNA Microarray Image Analysis - University of Illinois at Urbana ...

DNA Microarray Image Analysis - University of Illinois at Urbana ...

DNA Microarray Image Analysis - University of Illinois at Urbana ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

February 4, 2005<br />

<strong>DNA</strong> <strong>Microarray</strong> <strong>Image</strong> <strong>Analysis</strong><br />

Peter Bajcsy, PhD<br />

Research Scientist, Autom<strong>at</strong>ed Learning Group<br />

N<strong>at</strong>ional Center for Supercomputing Applic<strong>at</strong>ions (NCSA)<br />

Adjunct Assistant Pr<strong>of</strong>essor, CS and ECE Departments<br />

<strong>University</strong> <strong>of</strong> <strong>Illinois</strong> <strong>at</strong> <strong>Urbana</strong>-Champaign (UIUC)<br />

pbajcsy@ncsa.uiuc.edu


Outline<br />

• <strong>Microarray</strong> Problem – Introduction<br />

• <strong>Microarray</strong> Technology<br />

• <strong>Microarray</strong> D<strong>at</strong>a Processing Workflow<br />

• <strong>Microarray</strong> <strong>Image</strong> <strong>Analysis</strong><br />

— Grid Alignment Problem<br />

— Foreground Separ<strong>at</strong>ion<br />

— Spot Quality Assessment<br />

— Quantific<strong>at</strong>ion and Normaliz<strong>at</strong>ion<br />

• <strong>Microarray</strong> D<strong>at</strong>a Fusion and Visualiz<strong>at</strong>ion<br />

• Summary<br />

2


Public<strong>at</strong>ions<br />

• Journals:<br />

— Bajcsy P. “GridLine: Autom<strong>at</strong>ic Grid Alignment in <strong>DNA</strong><br />

<strong>Microarray</strong> Scans,” IEEE Transactions on <strong>Image</strong> Processing,<br />

VOL 13, NO 1, pp.15-25, January 2004. (accessible <strong>at</strong> URL:<br />

http://alg.ncsa.uiuc.edu/do/documents/public<strong>at</strong>ions)<br />

• Book chapters:<br />

— Bajcsy P., L. Liu and M. Band, “<strong>DNA</strong> <strong>Microarray</strong> <strong>Image</strong><br />

Processing,” Chapter <strong>of</strong> the book "<strong>DNA</strong> Array <strong>Image</strong><br />

<strong>Analysis</strong>: Nuts&Bolts" by Gerda Kamberova, Ph.D. (Ed.)<br />

published by <strong>DNA</strong> Press (in press).<br />

— Bajcsy P., J. Han, L. Liu and J. Young, “Survey <strong>of</strong> Bio-D<strong>at</strong>a<br />

<strong>Analysis</strong> from D<strong>at</strong>a Mining Perspective,” Chapter 2 <strong>of</strong> Jason<br />

T. L. Wang, Mohammed J. Zaki, Hannu T. T. Toivonen, and<br />

Dennis Shasha (eds.), D<strong>at</strong>a Mining in Bioinform<strong>at</strong>ics,<br />

Springer Verlag, 2004, pp.9-39.<br />

3


4<br />

MICROARRAY PROBLEM


<strong>Microarray</strong> Problem: Major Objective<br />

• Major Objective: Discover a comprehensive theory <strong>of</strong><br />

life’s organiz<strong>at</strong>ion <strong>at</strong> the molecular level<br />

— The major actors <strong>of</strong> molecular biology: the nucleic<br />

acids, DeoxyriboNucleic acid (<strong>DNA</strong>) and RiboNucleic<br />

Acids (RNA)<br />

— The central dogma <strong>of</strong> molecular biology<br />

Proteins are very<br />

complic<strong>at</strong>ed molecules with<br />

20 different amino acids.<br />

How to measure processes <strong>at</strong> the molecular level?<br />

=> microarray technology<br />

5


<strong>DNA</strong> <strong>Microarray</strong> Prepar<strong>at</strong>ion<br />

• <strong>DNA</strong> microarrays are typically composed <strong>of</strong><br />

thousands <strong>of</strong> <strong>DNA</strong> sequences, called probes,<br />

fixed to a glass or silicon substr<strong>at</strong>e. The <strong>DNA</strong><br />

sequences can be long (500-1500bp) c<strong>DNA</strong><br />

sequences or shorter (25-70 mer)<br />

oligonucleotide sequences. Oligonucleotide<br />

sequences can be presynthesized and<br />

deposited with a pin or piezoelectric spray or<br />

synthesized in situ by photolithographic or<br />

ink-jet technologies.<br />

• Rel<strong>at</strong>ive quantit<strong>at</strong>ive detection <strong>of</strong> gene<br />

expression or gene copy number can be<br />

carried out between two samples on one array<br />

or by single samples comparing multiple<br />

arrays.<br />

• Double-fluorescent technique uses samples<br />

from two sources th<strong>at</strong> are labeled with<br />

different fluorescent molecules (Cy3 and Cy5,<br />

or Alexa 555 and Alexa 647) and hybridized<br />

together on the same array.<br />

6


Input and Output <strong>of</strong> <strong>Microarray</strong> D<strong>at</strong>a <strong>Analysis</strong><br />

• Input: Laser image scans (d<strong>at</strong>a) and underlying experiment<br />

hypotheses or experiment designs (prior knowledge)<br />

• Output:<br />

— Conclusions about the input hypotheses or knowledge about<br />

st<strong>at</strong>istical behavior <strong>of</strong> measurements<br />

7<br />

— The theory <strong>of</strong> biological systems learnt autom<strong>at</strong>ically from<br />

d<strong>at</strong>a (machine learning perspective)<br />

– Model fitting, Inference process


Overview <strong>of</strong> <strong>Microarray</strong> Problem<br />

Biology Applic<strong>at</strong>ion Domain<br />

Valid<strong>at</strong>ion<br />

Experiment<br />

Design and<br />

Hypothesis<br />

<strong>Microarray</strong><br />

Experiment<br />

D<strong>at</strong>a <strong>Analysis</strong><br />

<strong>Image</strong><br />

<strong>Analysis</strong><br />

D<strong>at</strong>a Warehouse<br />

Knowledge discovery<br />

in d<strong>at</strong>abases (KDD)<br />

St<strong>at</strong>istics<br />

D<strong>at</strong>a<br />

Mining<br />

Artificial<br />

Intelligence (AI)<br />

8


Types <strong>of</strong> Expected <strong>Microarray</strong> D<strong>at</strong>a Mining and <strong>Analysis</strong><br />

Results<br />

Hypothetical Examples:<br />

• Binary answers using tests <strong>of</strong> hypotheses<br />

— Drug tre<strong>at</strong>ment is successful with a confidence level x.<br />

• St<strong>at</strong>istical behavior (probability distribution functions)<br />

— A class <strong>of</strong> genes with functionality X follows Poisson distribution.<br />

• Expected events<br />

— As the amount <strong>of</strong> tre<strong>at</strong>ment will increase the gene expression<br />

level will decrease.<br />

• Rel<strong>at</strong>ionships<br />

— Expression level <strong>of</strong> gene A is correl<strong>at</strong>ed with expression level <strong>of</strong><br />

gene B under varying tre<strong>at</strong>ment conditions (gene A and B are part<br />

<strong>of</strong> the same p<strong>at</strong>hway).<br />

• Decision trees<br />

— Classific<strong>at</strong>ion <strong>of</strong> a new gene sequence by a “domain expert”.<br />

9


10<br />

MICROARRAY DATA PROCESSING<br />

WORKFLOW


11<br />

<strong>Microarray</strong> D<strong>at</strong>a Processing Workflow


Challenges<br />

• Numerical Value <strong>Analysis</strong><br />

— <strong>Image</strong> analysis<br />

— Expression level analysis<br />

• Annot<strong>at</strong>ion <strong>Analysis</strong><br />

— Prior knowledge (e.g., experimental conditions)<br />

— Metad<strong>at</strong>a (e.g., gene annot<strong>at</strong>ion)<br />

• Numerical and Text D<strong>at</strong>a Fusion<br />

— Represent<strong>at</strong>ion<br />

— Common Keys<br />

• <strong>Analysis</strong> <strong>of</strong> Fused D<strong>at</strong>a<br />

— Algorithms for numerical, c<strong>at</strong>egorical and textual d<strong>at</strong>a<br />

• D<strong>at</strong>a Management<br />

— Storage, retrieval, upd<strong>at</strong>es<br />

• Comput<strong>at</strong>ional Requirements<br />

— Large search space & modeling complexity<br />

12


<strong>DNA</strong> <strong>Microarray</strong> <strong>Image</strong> <strong>Analysis</strong><br />

• The goal <strong>of</strong> microarray image analysis steps is to extract<br />

intensity descriptors from each spot th<strong>at</strong> represent gene<br />

expression levels and input fe<strong>at</strong>ures for further analysis.<br />

Biological conclusions are then drawn based on the results from<br />

d<strong>at</strong>a mining and st<strong>at</strong>istical analysis <strong>of</strong> all extracted fe<strong>at</strong>ures.<br />

• Components <strong>of</strong> <strong>DNA</strong> <strong>Microarray</strong> <strong>Analysis</strong><br />

— Grid Alignment Problem<br />

— Foreground Separ<strong>at</strong>ion<br />

— Quality Assurance<br />

— Quantific<strong>at</strong>ion<br />

— Normaliz<strong>at</strong>ion<br />

• D<strong>at</strong>a management - Minimal Inform<strong>at</strong>ion About <strong>Microarray</strong><br />

Experiments (MIAME) standard<br />

13


14<br />

MICROARRAY IMAGE PROCESSING<br />

REQUIREMENTS


Ideal <strong>Microarray</strong> <strong>Image</strong>?<br />

1. Ideal c<strong>DNA</strong> microarray image in terms <strong>of</strong> its image content:<br />

• Deterministic grid geometry<br />

• Known background intensity with zero uncertainty<br />

• Pre-defined spot shape (morphology)<br />

• Constant spot intensity th<strong>at</strong> (a) is different from the<br />

background, (b) is directly proportional to the biological<br />

phenomenon (up- or –down-regul<strong>at</strong>ion), and (c) has zero<br />

uncertainty for all spots.<br />

Ideal c<strong>DNA</strong> microarray image content => utopia<br />

2. Ideal c<strong>DNA</strong> microarray image in terms <strong>of</strong> st<strong>at</strong>istical confidence:<br />

• A very large number <strong>of</strong> pixels per spot (theoretically it would<br />

reach infinity)<br />

Constraints: Cost <strong>of</strong> experiments, image resolution (scanners),<br />

storage <strong>of</strong> extremely high resolution images and other<br />

specimen prepar<strong>at</strong>ion issues<br />

15


Sources <strong>of</strong> <strong>Microarray</strong> <strong>Image</strong> Vari<strong>at</strong>ions<br />

• The c<strong>DNA</strong> technology is a complex electrical-optical-chemical<br />

process th<strong>at</strong> spans c<strong>DNA</strong> slide fabric<strong>at</strong>ion, mRNA prepar<strong>at</strong>ion,<br />

fluorescence dye labeling, gene hybridiz<strong>at</strong>ion, robotic<br />

spotting, green and red fluorophores excit<strong>at</strong>ion by lasers,<br />

imaging using optics, slide scanning, analog to digital<br />

conversion using either charge-coupled devices (CCD) or<br />

photomultiplier tubes (PMT), and finally image storage and<br />

archiving.<br />

• Vari<strong>at</strong>ions: technologies, microarray image channels, file<br />

form<strong>at</strong>s, d<strong>at</strong>a accuracy, grid geometry, background, spot<br />

morphology, foreground and background intensity<br />

• Comput<strong>at</strong>ional Requirements: repe<strong>at</strong>ability (parameter “free”<br />

algorithms), sufficient storage and comput<strong>at</strong>ional resources<br />

16


<strong>Microarray</strong> <strong>Image</strong> Technologies<br />

• Affymetrix chips<br />

• Uses photolithography and solid-phase chemistry<br />

to produce arrays containing hundreds <strong>of</strong><br />

thousands <strong>of</strong> oligonucleotide probes packed <strong>at</strong><br />

extremely high densities.<br />

• Single-, double- or multi-fluorescent c<strong>DNA</strong><br />

microarray images<br />

• Variable substr<strong>at</strong>es: co<strong>at</strong>ed glass slides or nylon<br />

membrane or 2D gel m<strong>at</strong>erials.<br />

• Example: 532nm (red) and 632nm (green)<br />

wavelengths forming two channels.<br />

• <strong>Image</strong>s obtained by other labeling schemes, for<br />

example, with or without radio-isotopic labels lead<br />

to images with bright background and dark spots.<br />

17


Examples: Number <strong>of</strong> Channels, Multiple Technologies<br />

Double-Fluorescent Dye<br />

Radioactive Dye<br />

18


Vari<strong>at</strong>ions <strong>of</strong> Grid Geometry<br />

• Rot<strong>at</strong>ion<br />

• Multiple grids.<br />

• Missing rows<br />

19


Vari<strong>at</strong>ions <strong>of</strong> Background<br />

• Slide washing<br />

• Examples <strong>of</strong><br />

background noise<br />

th<strong>at</strong> could be<br />

modeled with PDF<br />

models <strong>of</strong> noise.<br />

(Normal PDF – left<br />

and Student’s t PDF<br />

–right).<br />

20


Vari<strong>at</strong>ion <strong>of</strong> Spot Morphology<br />

• Spot morphologies other than<br />

circular<br />

• Sp<strong>at</strong>ial and morphological<br />

vari<strong>at</strong>ions <strong>of</strong> spots (from left to<br />

right, top row first): (a) a regular<br />

spot, (b) an inverse spot or a ghost<br />

shape, (c) a sp<strong>at</strong>ially devi<strong>at</strong>ing<br />

spot inside <strong>of</strong> a grid cell, (d) a spot<br />

radius devi<strong>at</strong>ion, (e) a tapering<br />

spot or a comet shape, (f) a spot<br />

with a hole or a doughnut shape,<br />

(g) a partially missing spot and (h)<br />

a scr<strong>at</strong>ched spot.<br />

21


Vari<strong>at</strong>ions <strong>of</strong> Foreground and Background Intensity<br />

Kansas<br />

<strong>University</strong><br />

22


Examples: Sp<strong>at</strong>ially Varying Background Noise, Low SNR<br />

http://st<strong>at</strong>-www.berkeley.edu/users/terry/zarray/Html/begin.html<br />

<strong>University</strong><br />

<strong>of</strong> California<br />

-Berkeley<br />

Keck Center<br />

UIUC<br />

23


Other than c<strong>DNA</strong> <strong>Microarray</strong> <strong>Image</strong>s<br />

• Color changing chemicals for odor<br />

detection<br />

• Disease specific assays (in silicon)<br />

— Cystic fibrosis carrier screening<br />

(hybridiz<strong>at</strong>ion-based single nucleotite<br />

polymorphism pl<strong>at</strong>form configured on a<br />

multi-layered optically co<strong>at</strong>ed silicon<br />

surface: gold appearance under white<br />

light & blue appearance after binding <strong>of</strong><br />

anti-biotin to horse-radish peroxide due<br />

to <strong>at</strong>tenu<strong>at</strong>ion <strong>of</strong> wavelengths)<br />

24


25<br />

Examples: Sp<strong>at</strong>ial Resolution, Line Spacing, Grid Arrangement


26<br />

Examples: Illumin<strong>at</strong>ion, Noise, D<strong>at</strong>a Type, Missing Spots


IMAGE ANALYSIS:<br />

MICROARRAY GRID ALIGNMENT<br />

(SPOT FINDING OR GRIDDING)<br />

27


<strong>Microarray</strong> Grid Alignment: Objective<br />

• The objective <strong>of</strong> the grid alignment step is to localize a twodimensional<br />

(2D) array <strong>of</strong> spots in a microarray scan before any<br />

inform<strong>at</strong>ion is extracted from the spots.<br />

• Terminology: A 2D array <strong>of</strong> spots is also loosely denoted as a grid<br />

<strong>of</strong> spots, while one array <strong>of</strong> spots among multiple 2D arrays in<br />

one microarray scan is <strong>of</strong>ten denoted as a block or a sub-array <strong>of</strong><br />

spots<br />

2D array <strong>of</strong> spots<br />

Sub-array <strong>of</strong> spots<br />

28


Grid Alignment: Applic<strong>at</strong>ion Domains<br />

Applic<strong>at</strong>ion domains for the grid alignment problem:<br />

• Animal science and plant science - <strong>DNA</strong> expression<br />

level measurements (<strong>Microarray</strong> technology)<br />

• Chemistry – olfactory sensors based on a number <strong>of</strong><br />

different chemically responsive dyes deposited on a<br />

substr<strong>at</strong>e (e.g., paper, film, silica). The color change <strong>of</strong><br />

the dyes is determined after exposure to different<br />

analytes. The color change after exposure to different<br />

analytes can be accur<strong>at</strong>ely measured to identify<br />

unknown compounds and concentr<strong>at</strong>ion levels <strong>of</strong> known<br />

compounds.<br />

• Crystallography – 2D and 3D structure measurements<br />

using X-ray<br />

29


Classific<strong>at</strong>ion <strong>of</strong> Grid Alignment Methods<br />

There are two views on microarray grid alignment:<br />

• Autom<strong>at</strong>ion <strong>of</strong> methods:<br />

— Manual (a grid templ<strong>at</strong>e <strong>of</strong> spots is manually adjusted)<br />

— Semi-autom<strong>at</strong>ed (manual grid initializ<strong>at</strong>ion followed by<br />

autom<strong>at</strong>ed refinement)<br />

— Fully autom<strong>at</strong>ed (d<strong>at</strong>a driven without any human<br />

intervention based on one-time human setup)<br />

• <strong>Image</strong> analysis approach:<br />

— Templ<strong>at</strong>e-based<br />

— D<strong>at</strong>a-driven<br />

— Affymetrix chips (special case)<br />

30


31<br />

<strong>Microarray</strong> Grid Alignment: Previous Work<br />

Approaches:<br />

• The Affymetrix chips approach –<br />

— Pros: the alignment problem was simplified and hence the grid<br />

alignment became more accur<strong>at</strong>e.<br />

— Cons: the Affymetrix technology has been much more expensive<br />

than the technology with co<strong>at</strong>ed glass slides.<br />

• Templ<strong>at</strong>e-based approach: the most prevalent in existing s<strong>of</strong>tware<br />

packages, e.g., GenePix Pro by Axon Instruments, ScanAlyze or<br />

GridOnArray by Scanalytics.<br />

— Pros: Incorpor<strong>at</strong>es knowledge about ideal grid<br />

— Cons: manual alignment<br />

• D<strong>at</strong>a-driven approach:<br />

— image segment<strong>at</strong>ion<br />

— st<strong>at</strong>istical analysis <strong>of</strong> 1D image projections.<br />

— Pros: autom<strong>at</strong>ic alignment<br />

— Cons: accuracy and robustness, e.g., missing spots, noise


Previous Work: Templ<strong>at</strong>e-Based Grid Alignment<br />

• Templ<strong>at</strong>e-based alignment results obtained by visually<br />

aligning the left two columns (left) or the right two columns<br />

(right) <strong>of</strong> microarray spots<br />

32


<strong>Microarray</strong> Grid Alignment: Previous Work<br />

• Enhancements <strong>of</strong> templ<strong>at</strong>e-based m<strong>at</strong>ching:<br />

— autom<strong>at</strong>ic refinement search for a grid loc<strong>at</strong>ion given size and<br />

spacing <strong>of</strong> spots<br />

• Enhancements <strong>of</strong> d<strong>at</strong>a-driven m<strong>at</strong>ching:<br />

— Adaptive threshold segment<strong>at</strong>ion<br />

• The problem <strong>of</strong> irregular grid:<br />

— templ<strong>at</strong>e-based approaches fail without manual adjustment<br />

— d<strong>at</strong>a-driven approaches are capable <strong>of</strong> finding irregular grids<br />

but are prone to misalignment due to spurious or missing spots<br />

and are also dependent on many parameters.<br />

• Some s<strong>of</strong>tware packages: GenePix, QuantArray, Array Vision,<br />

ScanAnalyze, Spot <strong>Image</strong> <strong>Analysis</strong> and Dapple.<br />

33


34<br />

Grid Alignment Requirements<br />

Ideal grid alignment algorithm:<br />

• Finds irregularly row- and column-spaced 2D arrays with transl<strong>at</strong>ional and<br />

rot<strong>at</strong>ional <strong>of</strong>fsets.<br />

• Motiv<strong>at</strong>ion: The dipping pins bend over time and cause irregularity in a 2D<br />

arrangement <strong>of</strong> the printed spots. Any rot<strong>at</strong>ional <strong>of</strong>fset <strong>of</strong> a slide or dipping pins<br />

will cause a rot<strong>at</strong>ed 2D grid in a microarray image with respect to the image<br />

edge.<br />

• Performs alignment on images with any number <strong>of</strong> input channels.<br />

• Motiv<strong>at</strong>ion: Many image acquisition types.<br />

• Is color and spot size independent.<br />

• Motiv<strong>at</strong>ion: Many image acquisition types (color). Prepar<strong>at</strong>ion steps (spot size).<br />

• Is independent <strong>of</strong> any chosen primitive shape.<br />

• Motiv<strong>at</strong>ion: Technology development, e.g., CLONDIAG chip.<br />

• Is parameter “free”.<br />

• Motiv<strong>at</strong>ion: Repe<strong>at</strong>ability without any bias and optimal performance.<br />

• Accommod<strong>at</strong>es speed versus accuracy trade<strong>of</strong>fs.<br />

• Motiv<strong>at</strong>ion: Inspection versus <strong>Analysis</strong>


35<br />

Grid Alignment Algorithm Overview


Channel Fusion<br />

How to Process Multi-Channel <strong>Image</strong>s?<br />

- Answer: Channel Fusion by Boolean OR Function<br />

Input Two-Band<br />

<strong>Image</strong><br />

OR =<br />

36


<strong>Image</strong> Down-Sampling<br />

• Design <strong>of</strong> real-time systems<br />

• Accuracy versus comput<strong>at</strong>ional requirements<br />

• Accessing and processing millions <strong>of</strong> pixels is time consuming<br />

• Sub-sampling versus Down-sampling<br />

37


Line Score Comput<strong>at</strong>ion<br />

∑ ∑ ∑ ∑<br />

row j=col+kernel row+kernel j=col+kernel<br />

vertical<br />

Edge ( row, col) = I( i, j) −<br />

I( i, j)<br />

i= row− kernel j=col-kernel i=<br />

row j=col-kernel<br />

∑ ∑ ∑ ∑<br />

col i=row+kernel col+kernel i=row+kernel<br />

horizontal<br />

Edge ( row, col) = I( i, j) −<br />

I( i, j)<br />

j= col− kernel i=row-kernel j=<br />

col i=row-kernel<br />

Directional Edge Detection<br />

Results <strong>of</strong><br />

Directional<br />

Edge Detection<br />

38


Vertical and Horizontal Line Score<br />

Score( Line) = ∑ HistValue( Line, i)<br />

i > Sensitivity<br />

Score Definition<br />

Horizontal and<br />

Vertical<br />

Line Score<br />

Functions<br />

39


Angular Optimiz<strong>at</strong>ion<br />

∑<br />

N 2FindRow N 2FindCol<br />

TotalScore ( Angle ) = Score ( Row , Angle ) + Score ( Col , Angle )<br />

∑<br />

i= 0 i<br />

j=<br />

0<br />

j<br />

Total Score Used for Optimizing Grid Rot<strong>at</strong>ion<br />

40


Optional Regularity Enforcement<br />

• Desired Configur<strong>at</strong>ions<br />

— Irregular Rows & Columns<br />

— Regular Rows<br />

— Regular Columns<br />

— Regular Rows & Columns<br />

• Approach<br />

— Histogram line spacing<br />

— Line with the max score serves as the reference line<br />

41


Parameter Optimiz<strong>at</strong>ion<br />

• Input Variables:<br />

— N2FindRow or N2FindCol is the number <strong>of</strong> expected rows or<br />

columns in each grid<br />

— MinAngle and MaxAngle define the range <strong>of</strong> expected grid<br />

rot<strong>at</strong>ions in degrees<br />

— DownSamp represents the down-sampling r<strong>at</strong>io if it is desirable to<br />

reduce comput<strong>at</strong>ional requirements<br />

— Accept is the minimum score value from the range [0,100] for<br />

reporting a valid grid line<br />

— Sensitivity is the threshold value for separ<strong>at</strong>ing background noise<br />

from signal<br />

— Kernel defines the sp<strong>at</strong>ial extend <strong>of</strong> directional edge detection<br />

42


Processing Multiple Grids<br />

Line Discontinuity<br />

Approach<br />

Filtering Approach<br />

43


Experimental Evalu<strong>at</strong>ions<br />

Error Metric<br />

Total Misalignment Error<br />

Synthetic and Measured<br />

Test D<strong>at</strong>a<br />

-Tested Vari<strong>at</strong>ions<br />

•shape<br />

•amount <strong>of</strong> intensity blur<br />

•2D array arrangements<br />

•rot<strong>at</strong>ed arrays<br />

•downsampled arrays<br />

•arrays with missing spots.<br />

44


Spot Size & Spot Density<br />

45<br />

•Radius=2, 5, 9 and 12<br />

•Spacing=along rows from [17, 22]<br />

and along columns from [18, 24].


<strong>Image</strong> Blur (Noise Level)<br />

Low-Pass Kernel =5 Low-Pass Kernel =10<br />

Varying amount <strong>of</strong> signal blur<br />

•kernel values in [3,13]<br />

Misalignment errors<br />

•within the range [14.22%, 22.09%].<br />

Radius=5<br />

46


Missing Spots<br />

The fewer the spots in a line, the smaller line discrimin<strong>at</strong>ion<br />

(or detection robustness).<br />

47


Angular Accuracy<br />

-100% accuracy with<br />

synthetic d<strong>at</strong>a<br />

-probably less than<br />

100% accuracy for<br />

measured d<strong>at</strong>a due to<br />

pixel loc<strong>at</strong>ion round-<strong>of</strong>fs<br />

during image rot<strong>at</strong>ion<br />

48


Down-sampling<br />

•Experimental results: Comparison<br />

<strong>of</strong> results without any downsampling<br />

and with down-sampling<br />

by a factor <strong>of</strong> 2 and 3.<br />

DownSamp<br />

Parameter<br />

1 2 3<br />

Misalignment Error 6.4% 14.6% 21.3%<br />

Execution Time 175ms 65ms 45ms<br />

49


Grid Regularity?<br />

Grid alignment results without (left) and with (middle)<br />

regularity requirements imposed, and the difference<br />

between the corresponding two grid masks (right).<br />

Score=15.5% misalignment<br />

50


Grid Alignment Properties<br />

Color Invariant<br />

Grid Primitive Invariant<br />

51


Grid Alignment: Semi-Autom<strong>at</strong>ed vs. Fully-Autom<strong>at</strong>ed<br />

Single Grid: one or more bands<br />

Rot<strong>at</strong>ion<br />

Speed<br />

Background Noise<br />

52


Multiple Grids: Semi-Autom<strong>at</strong>ed vs. Fully-Autom<strong>at</strong>ed<br />

Grid Regularity<br />

Multiple Grids<br />

Multiple Grid Setup<br />

53


Grid Alignment: Summary<br />

• A novel d<strong>at</strong>a-driven grid alignment algorithm<br />

— detects irregularly row- and column-spaced spots in a 2D array<br />

— is independent <strong>of</strong> spot color and spot size<br />

— localizes a grid <strong>of</strong> other primitive shapes than the spot shapes<br />

— performs grid alignment on any number <strong>of</strong> image channels<br />

— reduces the number <strong>of</strong> free parameters to minimum by d<strong>at</strong>a<br />

driven optimiz<strong>at</strong>ion <strong>of</strong> most algorithmic parameters<br />

— has a built-in speed versus accuracy trade<strong>of</strong>f mechanism to<br />

accommod<strong>at</strong>e user’s requirements on performance time and<br />

accuracy <strong>of</strong> the results.<br />

• Open issues and future work:<br />

— Missing spots & robustness<br />

— Parallel processing, FPGA implement<strong>at</strong>ion (SGI effort)<br />

— Parameter optimiz<strong>at</strong>ion<br />

54


55<br />

MICROARRAY FOREGROUND<br />

SEPARATION


Foreground Separ<strong>at</strong>ion<br />

The goal <strong>of</strong> foreground separ<strong>at</strong>ion is to identify pixels th<strong>at</strong><br />

belong to foreground (signal) <strong>of</strong> expected spot shape and to<br />

background.<br />

Foreground separ<strong>at</strong>ion methods using:<br />

• sp<strong>at</strong>ial templ<strong>at</strong>es<br />

• intensity based clustering<br />

• intensity based segment<strong>at</strong>ion<br />

• sp<strong>at</strong>ial and intensity inform<strong>at</strong>ion.<br />

56


Foreground Separ<strong>at</strong>ion Using Sp<strong>at</strong>ial Templ<strong>at</strong>es<br />

• Separ<strong>at</strong>ion using sp<strong>at</strong>ial co-centric circular templ<strong>at</strong>es<br />

57


Foreground Separ<strong>at</strong>ion Using Intensity Based Clustering<br />

• Examples <strong>of</strong> accur<strong>at</strong>e (left – original image, and second from<br />

left - label image) and inaccur<strong>at</strong>e (second from right –<br />

original image, and right – label image) foreground<br />

separ<strong>at</strong>ion using intensity based clustering.<br />

58


Foreground Separ<strong>at</strong>ion Using Intensity Based Segment<strong>at</strong>ion<br />

• Frequently used seeded region growing and w<strong>at</strong>ershed<br />

segment<strong>at</strong>ion methods<br />

• An example <strong>of</strong> pros and cons <strong>of</strong> foreground separ<strong>at</strong>ion using<br />

intensity based clustering and segment<strong>at</strong>ion. Left – original<br />

image, middle – segment<strong>at</strong>ion result and right – clustering<br />

result.<br />

• Multiple interpret<strong>at</strong>ions <strong>of</strong> the original grid cell image on the<br />

left side<br />

59


Foreground Separ<strong>at</strong>ion Using Sp<strong>at</strong>ial And Intensity Inform<strong>at</strong>ion<br />

• Sp<strong>at</strong>ially constrained segment<strong>at</strong>ion and clustering<br />

• Mann-Whitney st<strong>at</strong>istical testing<br />

• Sp<strong>at</strong>ial and intensity trimming<br />

A couple <strong>of</strong> grid cell examples where contamin<strong>at</strong>ion pixels have to<br />

be trimmed<br />

60


Foreground Separ<strong>at</strong>ion From Multi-Channel <strong>Microarray</strong> <strong>Image</strong>s<br />

• Four types <strong>of</strong> separ<strong>at</strong>ion boundaries for spots versus<br />

background.<br />

— Hypersphere<br />

— Volume<br />

— Hyperplane<br />

— Point in a projected space<br />

61


Separ<strong>at</strong>ing Spots from Multi-Channel <strong>Image</strong>ry - Assumptions<br />

Boundary Type St<strong>at</strong>istical Assumptions Threshold Values<br />

Hyperphere Gaussian Distribution (scalar)<br />

µ + kσ<br />

dist<br />

dist<br />

Volume Uniform Distribution (1 point)<br />

V<br />

min, pts<br />

=<br />

1<br />

k<br />

V<br />

min,max<br />

Hyperplane Linearly Correl<strong>at</strong>ed Distributions (2 points)<br />

r<br />

v = ( µ ,min); µ ∈Plane,min<br />

⊥ Plane<br />

Nonlinear Using Projections<br />

Gaussian Distribution in Projected<br />

Space<br />

(scalar)<br />

µ + kσ<br />

projected<br />

projected<br />

62


Step 1: Separ<strong>at</strong>ing Spots from Background - Example<br />

Original image (400x400 image size, SHORT d<strong>at</strong>um type)<br />

Spot pixel counts<br />

• Hypersphere: 15913<br />

• Volume: 509<br />

• Hyperplane: 15877<br />

• Nonlinear after AND oper<strong>at</strong>ion: 13735<br />

• Nonlinear after OR oper<strong>at</strong>ion: 16045<br />

63


64<br />

MICROARRAY SPOT QUALITY<br />

ASSESSMENT


Goals and Objectives <strong>of</strong> Quality Assessment<br />

The main goals <strong>of</strong> image-based spot quality assessment (or grid<br />

screening) are:<br />

• to identify grid cells th<strong>at</strong> contain valid spots<br />

• to elimin<strong>at</strong>e invalid spots from further analysis<br />

• The QA objective is to assess thoroughly a spot quality and<br />

associ<strong>at</strong>e a quality score with each spot as a reliability<br />

coefficient during further processing.<br />

• The QA autom<strong>at</strong>ion objective is to completely elimin<strong>at</strong>e any<br />

human interaction, and detect any system<strong>at</strong>ic and unexpected<br />

errors.<br />

65


Spot Validity Criteria<br />

Spot validity criteria:<br />

• Assess foreground and background intensities.<br />

— (a) absolute background and foreground levels,<br />

— (b) background vari<strong>at</strong>ion,<br />

— (c) foreground s<strong>at</strong>ur<strong>at</strong>ion and<br />

— (d) foreground-to-background intensity r<strong>at</strong>io (or signal-tonoise<br />

r<strong>at</strong>io).<br />

• Evalu<strong>at</strong>e morphological properties <strong>of</strong> foreground<br />

— spot shape and size irregularities<br />

— spot loc<strong>at</strong>ion (position <strong>of</strong>fset)<br />

66


Quality Assessment: Spot Examples<br />

• Sp<strong>at</strong>ial and topological vari<strong>at</strong>ions <strong>of</strong> spots<br />

— (a) a regular spot<br />

— (b) an inverse spot or a ghost shape<br />

— (c) a sp<strong>at</strong>ially devi<strong>at</strong>ing spot inside <strong>of</strong> a grid cell<br />

— (d) a spot radius devi<strong>at</strong>ion<br />

— (e) a tapering spot or a comet shape<br />

— (f) a spot with a hole or a doughnut shape<br />

— (g) a partially missing spot<br />

— (h) a scr<strong>at</strong>ched spot.<br />

67


Criteria for Assessing Background and Foreground Intensities<br />

• Background intensity vari<strong>at</strong>ions<br />

• Foreground and background intensity uniformity<br />

q<br />

STAT<br />

FRG<br />

• Foreground intensity s<strong>at</strong>ur<strong>at</strong>ion<br />

• Signal-to-noise r<strong>at</strong>io<br />

q<br />

µ<br />

GLOBAL<br />

GLOBAL<br />

LOC& GLOB 1 BKG<br />

LOC& GLOB 2<br />

BKG<br />

BKG<br />

= ; q<br />

LOCAL GLOBAL BKG<br />

=<br />

LOCAL GLOBAL<br />

µ<br />

BKG<br />

+ µ<br />

BKG<br />

mBKG + mBKG<br />

σ<br />

FRG<br />

ABS<br />

= 1− ,max ,min ,max ,min<br />

1 I FRG<br />

− I FRG ABS<br />

BKG BKG<br />

FRG<br />

;<br />

BKG<br />

1<br />

I −<br />

q = − q = −<br />

I<br />

µ<br />

Range<br />

q<br />

FRG<br />

CONT<br />

SATURATION<br />

count<br />

= 1−<br />

s<strong>at</strong>ur<strong>at</strong>ed<br />

count<br />

m<br />

( ) ( )<br />

Range<br />

all<br />

q<br />

CATEG<br />

SATURATION<br />

=<br />

1; if count < T %<br />

s<strong>at</strong>ur<strong>at</strong>ed<br />

0; if count ≥T<br />

%<br />

s<strong>at</strong>ur<strong>at</strong>ed<br />

MEAN<br />

MEDIAN<br />

q = µ /( µ + µ ); q = m /( m + m )<br />

SNR FRG FRG BKG SNR FRG FRG BKG<br />

68


Criteria for Assessing Morphological Properties <strong>of</strong> Foreground<br />

• Spot shape<br />

— Area-based<br />

q<br />

A −A A−<br />

A<br />

= ; q = exp( − )<br />

AREA1 0 AREA2<br />

0<br />

SHAPE<br />

SHAPE<br />

A0 A0<br />

— Perimeter-based<br />

q<br />

AREA3<br />

SHAPE<br />

=<br />

A−<br />

A<br />

A<br />

0<br />

*100%<br />

q<br />

PERIM<br />

SHAPE<br />

= 4π<br />

A<br />

C<br />

2<br />

— Diameter-based<br />

q<br />

L−L L−L<br />

= ; q = exp( − )<br />

X−SECTION1 0 X−SECTION2<br />

0<br />

SHAPE<br />

SHAPE<br />

L0 L0<br />

• Spot loc<strong>at</strong>ion (spot displacement or position <strong>of</strong>fset)<br />

— Distance-based<br />

69


µσ<br />

Spot Screening Quality Criteria: GenePix and QuantArray<br />

Inspection Criteria<br />

Description<br />

SNR for each channel<br />

Signal and background variability<br />

Excessively high background<br />

µ /( µ + µ ); m /( m + m )<br />

k<br />

sig sig bkg sig sig bkg<br />

* µ / σ ; k * µ / σ<br />

sig sig sig bkg bkg bkg<br />

µ Global /( µ Global + µ ); m Global /( m Global + m )<br />

bkg bkg bkg bkg bkg bkg<br />

S<strong>at</strong>ur<strong>at</strong>ion<br />

−<br />

count<br />

1<br />

s<strong>at</strong>ur<strong>at</strong>ed<br />

count<br />

all<br />

Proportion <strong>of</strong> signal above µ+k*σ <strong>of</strong> background<br />

Spot Shape<br />

Foreground and background uniformity<br />

count<br />

> k*<br />

σ<br />

count<br />

4π<br />

A<br />

Perim<br />

2<br />

all<br />

; if µ > µ + kσ<br />

then count + 1<br />

sig bkg bkg > k*<br />

σ<br />

( Isig ,max<br />

− Isig ,min<br />

) ( Ibkg ,max<br />

− Ibkg<br />

,min<br />

)<br />

1 −<br />

;1−<br />

Range<br />

Range<br />

70


I2K Selecting Valid Pixels – SNR Method<br />

• SNR is computed directly from mean <strong>of</strong> background and<br />

foreground pixels<br />

• SNR criterion elimin<strong>at</strong>es spots with<br />

— no signal (SNR~1),<br />

— very weak signal (1


I2K Selecting Valid Pixels – Loc<strong>at</strong>ion & Size Method<br />

• Method:<br />

— 1. Estim<strong>at</strong>e background (sample mean and standard devi<strong>at</strong>ion ) and<br />

compute threshold<br />

— 2. Sp<strong>at</strong>ial centroid <strong>of</strong> pixels th<strong>at</strong> are above threshold define a new spot<br />

center.<br />

— 3. Background estim<strong>at</strong>es are iter<strong>at</strong>ively improved<br />

— 4. We identify spots th<strong>at</strong> have a spot centroid outside <strong>of</strong> the range [grid<br />

cell center ± 0.25*grid cell size]<br />

• Loc & Size method elimin<strong>at</strong>es tapering spots (comets) and partially missing<br />

spots.<br />

Original <strong>Image</strong> Mask <strong>Image</strong> – Loc<strong>at</strong>ion & Size Screening<br />

72


I2K Selecting Valid Pixels – Topology Method<br />

• Method:<br />

— 1. Perform connectivity analysis <strong>of</strong> signal pixels<br />

— 2. Estim<strong>at</strong>e spot radius from the connected set <strong>of</strong> pixels.<br />

— 3. If the estim<strong>at</strong>ed radius devi<strong>at</strong>es from the expected<br />

radius by more than a user specified percentage <strong>of</strong> the<br />

expected spot radius then the spot is elimin<strong>at</strong>ed.<br />

• Topology screening criterion is aimed <strong>at</strong> elimin<strong>at</strong>ing spots<br />

with scr<strong>at</strong>ches.<br />

Original <strong>Image</strong><br />

Mask <strong>Image</strong> – Topology Screening<br />

73


I2K Selecting Valid Pixels – St<strong>at</strong>istics Method<br />

• Method:<br />

— Estim<strong>at</strong>e probability distribution function<br />

(PDF) model <strong>of</strong> spot pixels<br />

— Perform histogram <strong>of</strong> PDFs for all grid cells.<br />

— Elimin<strong>at</strong>e spots th<strong>at</strong> do not follow PDF <strong>of</strong> the<br />

majority <strong>of</strong> grid cells.<br />

• St<strong>at</strong>istics criterion elimin<strong>at</strong>es s<strong>at</strong>ur<strong>at</strong>ed spots.<br />

Original <strong>Image</strong><br />

Mask <strong>Image</strong> – St<strong>at</strong>istics PDF Model Screening<br />

74


75<br />

MICROARRAY DATA<br />

QUANTIFICATION AND<br />

NORMALIZATION


D<strong>at</strong>a Quantific<strong>at</strong>ion<br />

• D<strong>at</strong>a quantific<strong>at</strong>ion (or spot fe<strong>at</strong>ure extraction) refers to<br />

extracting descriptive values <strong>of</strong> foreground and background<br />

pixels for each spot.<br />

• Ideally, extracted descriptors (also called fe<strong>at</strong>ures or<br />

<strong>at</strong>tributes) should be directly proportional to the mRNA<br />

quantity in the solution th<strong>at</strong> was deposited in a spot, and<br />

should represent the deposited gene expression level.<br />

• In reality, fluorescent intensity measurements in each<br />

channel might be scaled or distorted differently according<br />

to some linear or non-linear functions during d<strong>at</strong>a<br />

prepar<strong>at</strong>ion steps.<br />

76


Spot Descriptors<br />

• Volume <strong>of</strong> foreground intensity<br />

FRG Volume = ( µ − µ )* A<br />

FRG BKG FRG<br />

• Logarithmic r<strong>at</strong>io<br />

des<br />

X<br />

RATIO<br />

=<br />

X<br />

X<br />

CHANNEL 0<br />

FRG<br />

CHANNEL1<br />

FRG<br />

des<br />

⎛ X − X ⎞<br />

= ⎜ ⎟<br />

⎝<br />

⎠<br />

CHANNEL 0 CHANNEL 0<br />

X WRT BKG FRG BKG<br />

LOG RATIO<br />

log 2 CHANNEL1 CHANNEL1<br />

XFRG<br />

− XBKG<br />

• Regression R<strong>at</strong>ios<br />

77


Spot Descriptors<br />

• Visualiz<strong>at</strong>ion <strong>of</strong> Spot Descriptors<br />

• Selection <strong>of</strong> Spot Descriptors<br />

• Improving Robustness <strong>of</strong> Spot Descriptors<br />

78


Visualiz<strong>at</strong>ion <strong>of</strong> Spot Descriptors<br />

Selection and Visualiz<strong>at</strong>ion <strong>of</strong> Spot Descriptors<br />

Fe<strong>at</strong>ure Selection<br />

Mean Fe<strong>at</strong>ure <strong>Image</strong><br />

79


Normaliz<strong>at</strong>ion<br />

• The motiv<strong>at</strong>ion for normalizing microarray images and/or<br />

extracted descriptors comes from the fact th<strong>at</strong> one would<br />

like to compare results obtained from multiple slides,<br />

scanners, or labor<strong>at</strong>ories, and with multiple microarray<br />

techniques. The difficulty <strong>of</strong> performing meaningful<br />

comparisons arises from different slide prepar<strong>at</strong>ions (e.g.,<br />

amounts <strong>of</strong> mRNA), scanner settings, microarray protocols<br />

or labeling specifics.<br />

• The purpose <strong>of</strong> normaliz<strong>at</strong>ion is to adjust for these<br />

vari<strong>at</strong>ions, primarily for label efficiency and hybridiz<strong>at</strong>ion<br />

efficiency, so th<strong>at</strong> we can discover true biological vari<strong>at</strong>ions<br />

as defined by the microarray experimental studies.<br />

80


Normaliz<strong>at</strong>ion<br />

• Normaliz<strong>at</strong>ion using St<strong>at</strong>istical Descriptors<br />

— Z-Transform<strong>at</strong>ion<br />

I<br />

NORM STAT<br />

Z −TRANSFORM<br />

( row, col)<br />

=<br />

Irowcol ( , ) − µ<br />

σ<br />

— Background Correction<br />

des<br />

⎛ X − X ⎞<br />

= ⎜ ⎟<br />

⎝<br />

⎠<br />

CHANNEL 0 CHANNEL 0<br />

X WRT BKG FRG BKG<br />

LOG RATIO<br />

log 2 CHANNEL1 CHANNEL1<br />

XFRG<br />

− XBKG<br />

81


Normaliz<strong>at</strong>ion<br />

• Normaliz<strong>at</strong>ion using control spots<br />

— insert spots <strong>of</strong> known intensities or genes <strong>of</strong> known expression level<br />

into a microarray slide<br />

• Normaliz<strong>at</strong>ion using regression analyses<br />

— within-slide normaliz<strong>at</strong>ion (loc<strong>at</strong>ion or scale),<br />

– (a) loc<strong>at</strong>ion global normaliz<strong>at</strong>ion (log(red/green) – normaliz<strong>at</strong>ion<br />

factor)<br />

– (b) loc<strong>at</strong>ion intensity dependent normaliz<strong>at</strong>ion (log(red/green) –<br />

normaliz<strong>at</strong>ion factor as a function <strong>of</strong> spot intensity),<br />

– (c) loc<strong>at</strong>ion within-print-tip-group normaliz<strong>at</strong>ion (log(red/green) –<br />

grid dependent normaliz<strong>at</strong>ion factor as a function <strong>of</strong> spot<br />

intensity) and<br />

– (d) scale normaliz<strong>at</strong>ion (modeling spread <strong>of</strong> various print-tip<br />

groups)<br />

— paired-slide normaliz<strong>at</strong>ion (dye-swap),<br />

— multiple slide normaliz<strong>at</strong>ion<br />

82


83<br />

MICROARRAY DATA FUSION,<br />

ANALYSIS AND<br />

VISUALIZATION


Open Problems<br />

• Problem st<strong>at</strong>ements:<br />

• Problem #1: Given gene annot<strong>at</strong>ions and tabular inform<strong>at</strong>ion<br />

about expression level, fuse and visualize the inform<strong>at</strong>ion<br />

• Problem #2: Given a d<strong>at</strong>a base <strong>of</strong> microarray experiments<br />

(only numerical expression level inform<strong>at</strong>ion), find the best<br />

m<strong>at</strong>ch <strong>of</strong> a new microarray experiment in the d<strong>at</strong>a base.<br />

• Problem #3: Incorpor<strong>at</strong>e multiple knowledge discovery<br />

techniques into analyses<br />

84


Visualiz<strong>at</strong>ion<br />

• D<strong>at</strong>a: 3D cubes,distribution charts, curves, surfaces, link<br />

graphs, image frames and movies, parallel coordin<strong>at</strong>es<br />

• Results: pie charts, sc<strong>at</strong>ter plots, box plots, associ<strong>at</strong>ion rules,<br />

parallel coordin<strong>at</strong>es, dendograms, temporal evolution<br />

Pie chart<br />

Parallel coordin<strong>at</strong>es<br />

Temporal evolution<br />

85


Visualiz<strong>at</strong>ion <strong>of</strong> Clustering Results<br />

Class Labeling and Visualiz<strong>at</strong>ion<br />

Isod<strong>at</strong>a (K-means)<br />

Clustering<br />

Mean Fe<strong>at</strong>ure <strong>Image</strong><br />

Label <strong>Image</strong><br />

86


Web-Based Document<strong>at</strong>ion<br />

87<br />

http://alg.ncsa.uiuc.edu/tools/docs/i2k/manual/index.html


Summary<br />

• <strong>Microarray</strong> Technology and D<strong>at</strong>a Processing Workflow<br />

• <strong>Microarray</strong> <strong>Image</strong> <strong>Analysis</strong><br />

— Grid Alignment Problem<br />

— Foreground Separ<strong>at</strong>ion<br />

— Spot Quality Assessment<br />

— Quantific<strong>at</strong>ion and Normaliz<strong>at</strong>ion<br />

— Additional <strong>Microarray</strong> D<strong>at</strong>a Fusion and Visualiz<strong>at</strong>ion<br />

• Needed<br />

— Biologists to define a biologically meaningful<br />

problems/scenarios/experiments<br />

— Computer scientists to provide comput<strong>at</strong>ional tools<br />

— Analysts (st<strong>at</strong>istics, artificial intelligence, knowledge discovery in<br />

d<strong>at</strong>a bases) to introduce techniques<br />

88

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!