Discriminative Learning of Local Image Descriptors
Discriminative Learning of Local Image Descriptors
Discriminative Learning of Local Image Descriptors
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Discriminative</strong> <strong>Learning</strong> <strong>of</strong><br />
<strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong><br />
Authors: Matthew Brown, Gang Hua,<br />
Simon Winder<br />
讲解人: 樊彬
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
Matthew Brown<br />
作者简介<br />
Postdoctoral Fellow at the Ecole Polytechnique Fédérale de Lausanne (EPFL)<br />
PhD in Computer Science (UBC, 2005)<br />
MEng in Electrical and Information Sciences (Cambridge, 2000)<br />
Famous for his work on automatic 2D image stitch<br />
http://cvlab.epfl.ch/~brown/research/research.html<br />
Gang Hua<br />
Senior Researcher in Nokia Research Center, Hollywood<br />
Scientist in Micros<strong>of</strong>t Live Labs Research from 2006 to 2009<br />
PhD in Electrical and Computer Engineering (Northwestern University, 2006)<br />
M.S and B.S in Electrical Engineering (Xi’an Jiaotong University, 2002,1999)<br />
http://www.eecs.northwestern.edu/~ganghua/<br />
Simon Winder<br />
Senior Developer in Micros<strong>of</strong>t Research<br />
http://research.micros<strong>of</strong>t.com/en-us/people/swinder/
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
文章出处<br />
文章信息<br />
PAMI 2010, to appear<br />
相关文献<br />
<strong>Learning</strong> <strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong>. S. Winder and M.<br />
Brown.(CVPR2007).<br />
Discriminant Embedding for <strong>Local</strong> <strong>Image</strong> <strong>Descriptors</strong>. G.<br />
Hua, M. Brown and S. Winder. (ICCV2007).<br />
Picking the Best DAISY. S. Winder, G. Hua and M. Brown.<br />
(CVPR09).
摘要<br />
A realistic ground truth dataset <strong>of</strong> matched patches<br />
based on multi-view stereo data<br />
Describe a set <strong>of</strong> building blocks for constructing<br />
descriptors<br />
Parametric learning for local image descriptors<br />
Non-Parametric learning for local image<br />
descriptors<br />
Dimensionality reduction<br />
Obtain descriptors that exceed the state-<strong>of</strong>-the-art<br />
performance with lower dimensionality
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
拟解决的问题与采用的思路<br />
一方面,尽管局部描述子在计算机视觉领域得到高度重视<br />
和广泛应用,但目前大部分局部描述子都是人为设计好的<br />
特征变换。<br />
另一方面,虽然基于学习的方法广泛用于高层视觉任务中,<br />
底层的视觉处理方法却很少用到基于学习的方法。<br />
本文提出了一种自动的基于学习的局部描述子设计方法,<br />
基于训练样本,通过线性判别分析和Powell最小化方法分<br />
别学得最优的非参数局部描述子和参数局部描述子。
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
The Framework<br />
本文的方法
G-block T-block<br />
G-block: Gaussian smoothing<br />
S-block/<br />
E-block<br />
N-block
G-block T-block<br />
G-block: Gaussian smoothing<br />
S-block/<br />
E-block<br />
N-block<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch
G-block T-block<br />
G-block: Gaussian smoothing<br />
S-block/<br />
E-block<br />
N-block<br />
“simple-cell”<br />
stage<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch
G-block T-block<br />
G-block: Gaussian smoothing.<br />
S-block/<br />
E-block<br />
N-block<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch.<br />
S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />
S-block uses parameterized pooling regions, E-block is<br />
non-parametric.
G-block T-block<br />
G-block: Gaussian smoothing.<br />
S-block/<br />
E-block<br />
N-block<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch.<br />
“complex-cell”<br />
operations<br />
S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />
S-block uses parameterized pooling regions, E-block is<br />
non-parametric.
G-block T-block<br />
G-block: Gaussian smoothing.<br />
S-block/<br />
E-block<br />
N-block<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch.<br />
S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />
S-block uses parameterized pooling regions, E-block is<br />
non-parametric.<br />
N-block: SIFT-style Normalization.
G-block T-block<br />
G-block: Gaussian smoothing.<br />
S-block/<br />
E-block<br />
N-block<br />
T-block: Non-linear transformation to each sample grid in<br />
smoothed patch.<br />
S-block/E-block: Spatial pooling <strong>of</strong> the T-block responses.<br />
S-block uses parameterized pooling regions, E-block is<br />
non-parametric.<br />
N-block: SIFT-style Normalization.
G-block T-block<br />
Smoothed<br />
input patch<br />
I11 I12 I13 I14<br />
<br />
<br />
I I I I<br />
<br />
21 22 23 24 <br />
I31 I32 I33 I 34<br />
<br />
I I I I<br />
41 42 43 44 <br />
S-block/<br />
E-block<br />
An output grid, one<br />
k length vector per<br />
sample<br />
N-block<br />
f11 f12 f13 f14<br />
<br />
<br />
f f f f<br />
<br />
21 22 23 24 <br />
f31 f32 f33 f 34<br />
<br />
f f f f<br />
41 42 43 44
G-block T-block<br />
S-block/<br />
E-block<br />
N-block<br />
T1: Angle-quantized gradients. Magnitude is linearly assigned to two<br />
adjacent elements <strong>of</strong> orientation.<br />
T1a: 4 quantized directions; T1b: 8 quantized directions<br />
T2: Rectified gradients. Positive and negative separated gradient vector.<br />
T2a: { x x; x x; y y; y y}<br />
T2b: { x x; x x; y y; y y; x x; x x; y y; y y}
G-block T-block<br />
S-block/<br />
E-block<br />
T3: Steerable filters.<br />
T3g: 2 nd order, 4 orientations; T3h: 4 th order, 4 orientations;<br />
T3i: 2 nd order, 8 orientations; T3j: 4 th order, 8 orientations<br />
T4: DoG responses.<br />
D I( ) I( ), D I( ) I(<br />
)<br />
1 2 1 2 3 2<br />
T4: { D D ; D D ; D D ; D <br />
D }<br />
1 1 1 1 2 2 2 2<br />
N-block
G-block T-block<br />
S-block<br />
S-block/<br />
E-block<br />
N-block
G-block T-block<br />
E-block<br />
E1: PCA (Principal Component Analysis)<br />
E2: LPP (<strong>Local</strong> Preserving Projections)<br />
S-block/<br />
E-block<br />
E4: LDE (<strong>Local</strong> <strong>Discriminative</strong> Embedding)<br />
E6: GLDE (Generalized <strong>Local</strong> <strong>Discriminative</strong> Embedding)<br />
E3,E5,E7: orthogonality version <strong>of</strong> E2,E4,E6<br />
N-block
<strong>Learning</strong> Parametric <strong>Descriptors</strong><br />
Parameters: parameters <strong>of</strong> G,T,S,N-blocks<br />
Maximizing the area under the ROC curve.<br />
ROC: true positive rate VS. false positive rate<br />
Optimization method: Powell’s multidimensional direction<br />
set method
Input:<br />
<strong>Learning</strong> Non-Parametric<br />
<strong>Descriptors</strong> (E-block)<br />
S { x T ( p ), x T ( p ), l }<br />
i i j j ij<br />
Output: The optimized projections w.
Input:<br />
<strong>Learning</strong> Non-Parametric<br />
<strong>Descriptors</strong> (E-block)<br />
S { x T ( p ), x T ( p ), l }<br />
i i j j ij<br />
Output: The optimized projections w.<br />
E2: Minimizing the distance between the match pairs<br />
while keeping the overall variance <strong>of</strong> all vectors in the<br />
match pair set as big as possible in projection space.<br />
J ( w)<br />
<br />
<br />
1<br />
T<br />
( wx)<br />
1<br />
<br />
T<br />
2<br />
( w ( x ))<br />
l 1 i x<br />
<br />
j<br />
ij<br />
l<br />
ij<br />
i<br />
2
<strong>Learning</strong> Non-Parametric<br />
<strong>Descriptors</strong> (E-block)<br />
E4: Seeking the embedding space under which the<br />
distances between match pairs are minimized and the<br />
distances between non-matches pairs are maximized.<br />
J ( w)<br />
<br />
<br />
l<br />
ij<br />
0<br />
T<br />
( w ( x x ))<br />
2<br />
<br />
T<br />
2<br />
( w ( x ))<br />
l 1 i x<br />
<br />
j<br />
ij<br />
i j<br />
E6: Find projections that minimize the ratio <strong>of</strong> in-class<br />
variance for match pairs to the total data variance.<br />
( wx)<br />
T 2<br />
xiS i<br />
3( ) <br />
<br />
T<br />
2<br />
( w ( x ))<br />
l 1 i x<br />
<br />
j<br />
J w<br />
<br />
ij<br />
2
1<br />
2<br />
3<br />
<strong>Learning</strong> Non-Parametric<br />
0<br />
1<br />
<strong>Descriptors</strong> (E-block)<br />
A ( l ) x x<br />
T<br />
w Ai w<br />
i T<br />
J ( w)<br />
A ( x x )( x x )<br />
ij<br />
i<br />
B ( x x )( x x )<br />
l<br />
<br />
ij<br />
l<br />
S j<br />
<br />
<br />
A x x<br />
xS <br />
ij i<br />
T<br />
i<br />
i j i j<br />
T<br />
i i<br />
i j i j<br />
w Bw<br />
T<br />
T
1<br />
2<br />
3<br />
<strong>Learning</strong> Non-Parametric<br />
0<br />
1<br />
<strong>Descriptors</strong> (E-block)<br />
A ( l ) x x<br />
T<br />
w Ai w<br />
i T<br />
J ( w)<br />
A ( x x )( x x )<br />
ij<br />
i<br />
B ( x x )( x x )<br />
l<br />
<br />
ij<br />
l<br />
S j<br />
<br />
<br />
A x x<br />
xS <br />
ij i<br />
T<br />
i<br />
i j i j<br />
T<br />
i i<br />
i j i j<br />
w Bw<br />
T<br />
T<br />
A w <br />
Bw<br />
i
<strong>Learning</strong> Non-Parametric<br />
<strong>Descriptors</strong> (E-block)<br />
Orthogonality constraint on projections<br />
w , w , ,<br />
w 1 2 k1<br />
T<br />
w Ai w<br />
arg max w T<br />
w Bw<br />
T<br />
s. t. w w 0, j 1,2, ,<br />
k 1<br />
j
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
实验<br />
数据:大约250万标记好的匹配对和未匹配对,来自Yosemite,<br />
Notre Dame和Liberty三个自然场景的三维重建数据。
实验<br />
数据:大约250万标记好的匹配对和未匹配对,来自Yosemite,<br />
Notre Dame和Liberty三个自然场景的三维重建数据。<br />
评价方法<br />
ROC曲线: Correct Match Fraction VS. Incorrect Match Fraction<br />
95%错误率:在95%正确匹配被找到的条件下,不正确匹配数的<br />
比例
Parametric <strong>Descriptors</strong>
Parametric <strong>Descriptors</strong>
Parametric <strong>Descriptors</strong>
Parametric <strong>Descriptors</strong><br />
1、凹的形状<br />
2、离中心越远,累加的区域越大<br />
3、性能优于SIFT,但维数高
Non-Parametric <strong>Descriptors</strong>
Non-Parametric <strong>Descriptors</strong><br />
Trained on Yosemite, tested on Notre Dame
Non-Parametric <strong>Descriptors</strong>
Non-Parametric <strong>Descriptors</strong>
Dimension Reduced Parametric<br />
<strong>Descriptors</strong>
Dimension Reduced Parametric<br />
<strong>Descriptors</strong>
Dimension Reduced Parametric<br />
<strong>Descriptors</strong>
Dimension Reduced Parametric<br />
<strong>Descriptors</strong>
Effects <strong>of</strong> Normalization<br />
r/sqrt(D)
作者信息<br />
文章信息<br />
提纲<br />
拟解决的问题与采用的思路<br />
本文的方法<br />
实验<br />
结论
结论<br />
The techniques have been used in Photosynth and ICE<br />
(<strong>Image</strong> Compositing Editor)<br />
Photosynth: www.photosynth.com<br />
ICE: http://research.micros<strong>of</strong>t.com/ivm/ice.html
结论<br />
Recommendations by the authors
结论<br />
Recommendations by the authors<br />
1. <strong>Learning</strong> parameters from training data.
结论<br />
Recommendations by the authors<br />
1. <strong>Learning</strong> parameters from training data.<br />
2. Use foveated summation regions.
结论<br />
Recommendations by the authors<br />
1. <strong>Learning</strong> parameters from training data.<br />
2. Use foveated summation regions.<br />
3. Use non-linear filter responses.
结论<br />
Recommendations by the authors<br />
1. <strong>Learning</strong> parameters from training data.<br />
2. Use foveated summation regions.<br />
3. Use non-linear filter responses.<br />
4. Use LDA for discriminative dimension reductions.
结论<br />
Recommendations by the authors<br />
1. <strong>Learning</strong> parameters from training data.<br />
2. Use foveated summation regions.<br />
3. Use non-linear filter responses.<br />
4. Use LDA for discriminative dimension reductions.<br />
5. Normalization.
Thanks!<br />
Questions?