Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
Segmentation of heterogeneous document images : an ... - Tel
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The next adv<strong>an</strong>tage <strong>of</strong> using CRFs is that they provide a foundation that<br />
easily induces global, local <strong>an</strong>d regional knowledge <strong>of</strong> the <strong>document</strong> in the process.<br />
Because <strong>of</strong> the conditional nature <strong>of</strong> the system, <strong>an</strong>y interaction between<br />
the fields’ labels in a Markov network c<strong>an</strong> be learned from available knowledge.<br />
In this chapter we first introduce two-dimensional CRFs. Next, we describe<br />
feature functions which are the most import<strong>an</strong>t building blocks <strong>of</strong> CRFs. Then<br />
we note different types <strong>of</strong> observations that we extract from a <strong>document</strong> image.<br />
These observation are used inside our feature functions. This chapter also<br />
includes methods for decoding optimal label configuration for two-dimensional<br />
CRFs <strong>an</strong>d methods for training parameters <strong>of</strong> the model. Finally, we note the<br />
results.<br />
4.1 Conditional r<strong>an</strong>dom fields (CRFs)<br />
tel-00912566, version 1 - 2 Dec 2013<br />
First appeared in the domain <strong>of</strong> natural l<strong>an</strong>guage processing, conditional r<strong>an</strong>dom<br />
fields are proposed by Lafferty et al. [50] as a framework for building<br />
probabilistic models to segment <strong>an</strong>d label sequence data. Example <strong>of</strong> such sequence<br />
data c<strong>an</strong> be found in a wide variety <strong>of</strong> problems in text <strong>an</strong>d speech<br />
processing such as part-<strong>of</strong>-speech (POS) tagging.<br />
Among probabilistic models that perform the same task, we c<strong>an</strong> name Hidden<br />
Markov Models (HMMs) [78] that are well understood <strong>an</strong>d widely used<br />
throughout the literature. HMMs identify the most likely label sequence for<br />
<strong>an</strong>y given observation sequence. They assign a joint probability p(x, y) to pairs<br />
<strong>of</strong> observation (x) <strong>an</strong>d label (y) sequences. To define a joint probability <strong>of</strong> this<br />
type, models must enumerate all possible observation sequences which is intractable<br />
for most domains, unless the observation elements are independent <strong>of</strong><br />
each other within the observation sequence. Although this assumption is appropriate<br />
for simple toy examples, most practical observations are best represented<br />
in terms <strong>of</strong> multiple features with long-r<strong>an</strong>ge dependencies. CRFs address this<br />
issue by using a conditional probability p(y|x) over label sequence given <strong>an</strong><br />
observation sequence, rather th<strong>an</strong> a joint distribution over both label <strong>an</strong>d observation<br />
sequences.<br />
CRFs first appeared in the form <strong>of</strong> chain-conditional r<strong>an</strong>dom fields. In other<br />
words, several fields are connected in a sequential format <strong>an</strong>d the label <strong>of</strong> each<br />
field depends on the label <strong>of</strong> the field on its left <strong>an</strong>d on the whole observation<br />
sequence. This model best fits for applications in signal <strong>an</strong>d natural l<strong>an</strong>guage<br />
processing in which the data appear naturally in a row format. In our application,<br />
we deal with <strong>images</strong>, which c<strong>an</strong> be expressed naturally in two dimensions.<br />
Thus, we are interested in two-dimensional conditional r<strong>an</strong>dom fields.<br />
To obtain our two-dimensional r<strong>an</strong>dom fields, we first divide the <strong>document</strong><br />
image into rect<strong>an</strong>gular blocks with equal heights <strong>an</strong>d widths. We call each block<br />
a site. Contrary to other CRFs that use sites with fix sizes for all <strong>document</strong><br />
<strong>images</strong>, we choose the height <strong>an</strong>d width <strong>of</strong> sites to be identical to half <strong>of</strong> the<br />
me<strong>an</strong> <strong>of</strong> heights <strong>an</strong>d widths <strong>of</strong> all text characters on the <strong>document</strong>, respectively.<br />
53