01.03.2013 Views

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Toward Automatic Semantic Annotating and Pattern Mining for Domain Knowledge Acquisition<br />

where the left part denotes that the probability of concept ci is distinguished by the semantic<br />

label label n on the condition that i c co-occurs with c j . Pc ( label )<br />

i n is the prior probability of<br />

c i , which is distinguished by semantic label label n . The denominator means the probability<br />

of i c co-occurs with each j c and Pc ( c | n )<br />

i j label gives the probability of appearing j c when<br />

c i is labeled by label n . Noting that the denominator remains constant when i c and label n are<br />

fixed, as a result, we only need to calculate the product of each Pc ( c | )<br />

i j label n and Pc ( label )<br />

i n<br />

to determine the semantic label of i c using the following equation:<br />

label = Max arg[ Pc ( labeln) ∏ Pc ( c j | labeln)]<br />

(3)<br />

i i<br />

n j≠i The concept annotation method based on the semantic bank is further implemented in our<br />

system, which can analyze a sentence and annotate concepts with semantic labels automatically.<br />

Relation Learning<br />

Since the acquired frequent patterns have a larger coverage of sentences than normal structural<br />

patterns, and they can be used to extract more concepts having similar relationship theoretically.<br />

With annotated and validated training data, our method can learn the patterns and their relations<br />

so as to be applied into the extraction of more concepts extracted from the original domain<br />

corpus. The detailed method is as follows:<br />

Given an annotated sentence with knowledge representation as a training sentence s t, the<br />

method first analyzes the sentence by Minipar to acquire structural patterns p t and corresponding<br />

relation r t. After that, the pattern is matched with each pattern in the frequent pattern set. If<br />

there is a pattern p f matched, the r t is then applied to the relation representation of all nouns<br />

extracted from p f. The detailed equation for this purpose is shown in the following:<br />

2×<br />

MS MS<br />

MScore( pt<br />

) =<br />

MS + MS<br />

p p<br />

t f<br />

p p<br />

t f<br />

pt and pf are firstly split into items MS and p MS and these items are then used to match<br />

t<br />

p f<br />

with each other. The matching score, as MScore for the pattern pt, is calculated based on<br />

the counting matched items with all the items.<br />

From Equation (4), the frequent pattern matched best to the current structural pattern was<br />

obtained compared with a matching threshold. If there is no frequent pattern fulfils the current<br />

structural pattern still can be applied to pattern learning though its question matching coverage<br />

is less than the frequent patterns.<br />

For example, there is training sentence “the signs of a heart attack are chest discomfort,<br />

discomfort in other areas of the upper body, shortness of breath, and other symptoms”. Its<br />

annotation is shown in the following:<br />

<strong>Pertanika</strong> J. Sci. & Technol. <strong>21</strong> (1): 283 - 298 (<strong>2013</strong>)<br />

(4)<br />

177

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!