04.04.2015 Views

sign detection in natural images with conditional random fields

sign detection in natural images with conditional random fields

sign detection in natural images with conditional random fields

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

a region) and g = ( g j) is a vector of edge features (i.e., differences<br />

j=1...J<br />

between statistics of neighbor<strong>in</strong>g regions), so that F = ( )<br />

fy<br />

k )<br />

and<br />

k=1...K,y∈Y<br />

G =<br />

(g j y,y<br />

. Thus λ ∈ R K|Y| and µ ∈ R ′ J|Y|2 . When E =<br />

j=1...J,y,y ′ ∈Y×Y<br />

∅, the model uses no label context and is commonly called a <strong>conditional</strong><br />

maximum entropy classifier (hence, MaxEnt), or logistic regression.<br />

Tra<strong>in</strong><strong>in</strong>g and Inference<br />

Parameters for probabilistic models like CRFs are generally set by maximiz<strong>in</strong>g<br />

the likelihood of a data sample. Unfortunately, <strong>in</strong>ference for any <strong>random</strong><br />

field <strong>with</strong> the lattice topology is <strong>in</strong>tractable due to Z (an exponential sum).<br />

Markov cha<strong>in</strong> Monte Carlo (MCMC) (see e.g., [16]) is often used to approximate<br />

Z <strong>in</strong> similar generative models. However, <strong>in</strong> our <strong>conditional</strong> model Z is<br />

dependent on the image data x and must be estimated for each observation<br />

<strong>in</strong> the sample. A simpler approximation is to maximize the pseudo-likelihood<br />

(PL) [1], which is the product of the probabilities of nodes given their neighbor<strong>in</strong>g<br />

labels. The normalizers are then summations over labels at a s<strong>in</strong>gle<br />

node, rather than the possible label<strong>in</strong>gs of all nodes.<br />

A relatively new alternative to MCMC and PL for approximat<strong>in</strong>g likelihood<br />

is called tree reparameterization (TRP) [15]. Inference <strong>in</strong> graphical<br />

models <strong>with</strong>out cycles (unlike the lattice) is very efficient, i.e., due to the<br />

junction tree algorithm (e.g., [10]). An important consequence of the junction<br />

tree algorithm is that marg<strong>in</strong>al distributions are revealed on pairs of<br />

neighbor<strong>in</strong>g nodes, <strong>in</strong>duc<strong>in</strong>g an alternative factorization of the jo<strong>in</strong>t distribution.<br />

TRP operates by us<strong>in</strong>g junction tree to compute the exact marg<strong>in</strong>als on<br />

a spann<strong>in</strong>g tree of the cyclic graph. The spann<strong>in</strong>g tree’s factorization is then<br />

placed back <strong>in</strong>to the orig<strong>in</strong>al graph, and the process repeats <strong>with</strong> different<br />

spann<strong>in</strong>g trees until the parameterization converges, leav<strong>in</strong>g the marg<strong>in</strong>als.<br />

We demonstrate improved <strong>detection</strong> performance us<strong>in</strong>g TRP to approximate<br />

the likelihood over pseudo-likelihood. The likelihood function is convex and<br />

may be optimized globally via gradient ascent. Pseudo-likelihood is sensitive<br />

to <strong>in</strong>itialization, however, so node parameters are optimized first. We use the<br />

quasi-newton L-BFGS algorithm for maximization.<br />

To prevent tra<strong>in</strong><strong>in</strong>g procedures from overfitt<strong>in</strong>g parameters <strong>in</strong> <strong>conditional</strong><br />

models, a prior is <strong>in</strong>troduced, and the posterior is maximized rather than<br />

likelihood. We employ a diagonal zero-centered Gaussian prior on parameters<br />

[2] (similar to weight decay <strong>in</strong> neural networks or ridge regression); variances<br />

are experimentally determ<strong>in</strong>ed through cross-validation.<br />

Given the image data, our model simply yields a jo<strong>in</strong>t posterior distribution<br />

on label<strong>in</strong>gs. When <strong>in</strong>terested <strong>in</strong> pick<strong>in</strong>g a hard and fast label for each<br />

region of the image (node <strong>in</strong> the graph), the question becomes what to do<br />

<strong>with</strong> that distribution. A simple, oft-used answer is to f<strong>in</strong>d its maximum.<br />

That is, use maximum a posteriori (MAP) estimation:<br />

ŷ = arg max<br />

y∈Y |V | p (y | x) .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!