27.09.2014 Views

Binary Stereo Matching

Binary Stereo Matching

Binary Stereo Matching

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

lumination may have large variations. Another point<br />

we want to mention is that the core algorithm of BSM<br />

is based on binary and integer computations, so it will<br />

still be fast on embedded or mobile devices which do<br />

not have powerful floating point units.<br />

The rest of this paper is organized as follows. We<br />

review some state-of-the-art local methods in Section 2.<br />

Then our cost computation and aggregation approach<br />

together with BSM are explained in Section 3. Experimental<br />

results and analysis are given in Section 4. Finally<br />

we draw conclusion in Section 5<br />

2. State-of-the-Art<br />

In this section, we briefly review state-of-the-art local<br />

methods. Bleyer et al. estimate a 3D plane at each<br />

pixel by applying PatchMatch [1] into stereo matching<br />

and their method is currently top-performer among local<br />

methods. Hosni et al. [7] aggregate matching cost<br />

by computing geodesic distance from all pixels to the<br />

window’s center. De-Maeztu et al. [3] and Rhemann<br />

et al. [9] both adopt guided filter [5] for cost aggregation<br />

and have speed advantages comparing to traditional<br />

local methods like [14]. Detailed review of other traditional<br />

local methods is proposed in [4, 6, 13]. Overall,<br />

most of state-of-the-art local methods use absolute<br />

pixel intensity difference for composing cost volume<br />

so that their performances drop dramatically under radiometric<br />

differences. Some methods explicitly handle<br />

radiometric differences like rank and census transform<br />

[15], however, their performances on normal images are<br />

not so good compared with state-of-the-art local methods<br />

[6]. Our binary stereo matching algorithm not only<br />

achieves comparable performance with state-of-the-art<br />

methods but is also robust to the radiometric differences<br />

(especially to exposure changes).<br />

3. Proposed Approach<br />

In this section we will explain our approach in detail.<br />

Our binary stereo matching algorithm also follows the<br />

classical four steps as stated before.<br />

In cost computation, our approach is completely different<br />

with traditional local methods. We directly introduce<br />

BRIEF descriptor [2] into cost computation. Thus,<br />

BRIEF descriptor B(x) is calculated for every pixel x<br />

in the input image pair. According to [2], B(x) is defined<br />

as:<br />

B(x) =<br />

∑<br />

2 i−1 τ(p i , q i ) (2)<br />

1≤i≤n<br />

where ⟨p 1 , q 1 ⟩, ⟨p 2 , q 2 ⟩, . . . , ⟨p n , q n ⟩ are n pairs of pixels.<br />

Each pair ⟨p i , q i ⟩ is sampled by an isotropic Gaussian<br />

distribution in a S × S window, which is centered<br />

on pixel x. And τ(p i , q i ) is a binary function which is<br />

defined as:<br />

τ(p i , q i ) =<br />

{ 1 : I(pi ) > I(q i )<br />

0 : I(p i ) ≤ I(q i )<br />

(3)<br />

where I(x) denotes the intensity of pixel x. After calculating<br />

the descriptor, i.e. a binary string for each pixel,<br />

the cost volume is constructed as:<br />

C(x, d) = ∥ B(x) XOR B(x d ) ∥ 1 (4)<br />

where x d is the corresponding pixel of x with disparity<br />

d in another view, XOR is a bitwise xor-operation. In<br />

short, C(x, d) measures the hamming distance between<br />

two binary strings.<br />

Directly using BRIEF for stereo matching is a<br />

straightforward thought, which can be implemented by<br />

adopting C(x, d) in (4) into WTA strategy as mentioned<br />

in (1). However, this naive approach will lead to the<br />

well-known edge-fattening problem as shown in Figure<br />

1(b). To solve edge-fattening problem, we invent<br />

a novel cost aggregation method by introducing another<br />

binary string which we call binary mask. Firstly, we<br />

define a weight function for pixel pair ⟨p i , q i ⟩ in (2) as:<br />

w(x, p i , q i ) = max(SAD(x, p i ), SAD(x, q i )) (5)<br />

where SAD(x, y) = ∑ c∈[L,A,B] |I c(x) − I c (y)| is the<br />

sum of absolute difference between two pixels in the<br />

CIELAB color space. Then we get our bitwise mask<br />

function for a given pair ⟨p i , q i ⟩ as:<br />

{ 1 : w(x, pi , q<br />

δ(x, p i , q i ) =<br />

i ) ≤ T<br />

(6)<br />

0 : w(x, p i , q i ) > T<br />

where T is set to be the quarter smallest value in the sequence<br />

w(x, p 1 , q 1 ), w(x, p 2 , q 2 ), . . . , w(x, p n , q n ). Finally,<br />

we can use this mask function to compose a binary<br />

mask:<br />

Φ(x) =<br />

∑<br />

2 i−1 δ(x, p i , q i ) (7)<br />

1≤i≤n<br />

Consequently Φ(x) is the proposed binary mask for cost<br />

aggregation. Incorporating the binary mask into (4), the<br />

new cost volume is defined as:<br />

C(x, d) = ∥B(x) XOR B(x d ) AND Φ(x)∥ 1 (8)<br />

According to the definition of the binary mask, it will<br />

preserve those pixel pairs who have similar depth with<br />

window’s center. After adopting our cost aggregation<br />

method, the edge-fattening effect is ideally removed as<br />

shown in Figure 1(c).<br />

Since local WTA strategy cannot handle occluded<br />

area, there are a large amount of errors in this region<br />

357

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!