14.01.2014 Views

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

Segmentation of heterogeneous document images : an ... - Tel

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The next adv<strong>an</strong>tage <strong>of</strong> using CRFs is that they provide a foundation that<br />

easily induces global, local <strong>an</strong>d regional knowledge <strong>of</strong> the <strong>document</strong> in the process.<br />

Because <strong>of</strong> the conditional nature <strong>of</strong> the system, <strong>an</strong>y interaction between<br />

the fields’ labels in a Markov network c<strong>an</strong> be learned from available knowledge.<br />

In this chapter we first introduce two-dimensional CRFs. Next, we describe<br />

feature functions which are the most import<strong>an</strong>t building blocks <strong>of</strong> CRFs. Then<br />

we note different types <strong>of</strong> observations that we extract from a <strong>document</strong> image.<br />

These observation are used inside our feature functions. This chapter also<br />

includes methods for decoding optimal label configuration for two-dimensional<br />

CRFs <strong>an</strong>d methods for training parameters <strong>of</strong> the model. Finally, we note the<br />

results.<br />

4.1 Conditional r<strong>an</strong>dom fields (CRFs)<br />

tel-00912566, version 1 - 2 Dec 2013<br />

First appeared in the domain <strong>of</strong> natural l<strong>an</strong>guage processing, conditional r<strong>an</strong>dom<br />

fields are proposed by Lafferty et al. [50] as a framework for building<br />

probabilistic models to segment <strong>an</strong>d label sequence data. Example <strong>of</strong> such sequence<br />

data c<strong>an</strong> be found in a wide variety <strong>of</strong> problems in text <strong>an</strong>d speech<br />

processing such as part-<strong>of</strong>-speech (POS) tagging.<br />

Among probabilistic models that perform the same task, we c<strong>an</strong> name Hidden<br />

Markov Models (HMMs) [78] that are well understood <strong>an</strong>d widely used<br />

throughout the literature. HMMs identify the most likely label sequence for<br />

<strong>an</strong>y given observation sequence. They assign a joint probability p(x, y) to pairs<br />

<strong>of</strong> observation (x) <strong>an</strong>d label (y) sequences. To define a joint probability <strong>of</strong> this<br />

type, models must enumerate all possible observation sequences which is intractable<br />

for most domains, unless the observation elements are independent <strong>of</strong><br />

each other within the observation sequence. Although this assumption is appropriate<br />

for simple toy examples, most practical observations are best represented<br />

in terms <strong>of</strong> multiple features with long-r<strong>an</strong>ge dependencies. CRFs address this<br />

issue by using a conditional probability p(y|x) over label sequence given <strong>an</strong><br />

observation sequence, rather th<strong>an</strong> a joint distribution over both label <strong>an</strong>d observation<br />

sequences.<br />

CRFs first appeared in the form <strong>of</strong> chain-conditional r<strong>an</strong>dom fields. In other<br />

words, several fields are connected in a sequential format <strong>an</strong>d the label <strong>of</strong> each<br />

field depends on the label <strong>of</strong> the field on its left <strong>an</strong>d on the whole observation<br />

sequence. This model best fits for applications in signal <strong>an</strong>d natural l<strong>an</strong>guage<br />

processing in which the data appear naturally in a row format. In our application,<br />

we deal with <strong>images</strong>, which c<strong>an</strong> be expressed naturally in two dimensions.<br />

Thus, we are interested in two-dimensional conditional r<strong>an</strong>dom fields.<br />

To obtain our two-dimensional r<strong>an</strong>dom fields, we first divide the <strong>document</strong><br />

image into rect<strong>an</strong>gular blocks with equal heights <strong>an</strong>d widths. We call each block<br />

a site. Contrary to other CRFs that use sites with fix sizes for all <strong>document</strong><br />

<strong>images</strong>, we choose the height <strong>an</strong>d width <strong>of</strong> sites to be identical to half <strong>of</strong> the<br />

me<strong>an</strong> <strong>of</strong> heights <strong>an</strong>d widths <strong>of</strong> all text characters on the <strong>document</strong>, respectively.<br />

53

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!