JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

More documents

Recommendations

Info

Tianyong Hao and Yingying Qu In the past several decades, a number of these methods were followed with Cognitive Science and other disciplines. Due to the high complexity of natural language, i.e. with the huge amount of concepts and relations in free texts being too complicated to be formalized by automatic methods, the acquisition of high quality knowledge for the purpose of fine-grained data processing is still mainly relying on the costly manual labour at present. In the field of medical informatics, there is a high demand for domain knowledge within a formally specified framework and knowledge based is getting more and more accepted (Campbell et al., 1994; Hull & Gomez, 1999; Cao, 2001; Dawoud et al., 2012). It requires medical knowledge to be expressed in a sound and consistent manner. Knowledge engineering in an ontologically coherent (sub) domain makes use of existing knowledge compilations, such as ICD-10, SNOMED, MESH, UMLS, as much as possible. Due to the insufficiency of formal specification and levels of granularity, only a limited amount of the knowledge can be reused even if they are judged transferable in principle (Carenini & Moore, 1993; Firdaus et al., 2012). Most knowledge is contained in a large collection of unstructured textual documents. Ordinary approaches are to acquire knowledge from the documents manually and then formalize the knowledge at the conceptual level. However, this acquisition procedure mainly relies on a large number of knowledge engineers’ manual efforts, which is rather labour intensive, time consuming and troublesome. Moreover, the huge amounts of concepts and relations in domain texts are extremely complicated to be formalized by knowledge engineers. Thus, automated or semi-automated knowledge acquisition techniques are urgently required (Wang et al., 2006; Firdaus et al., 2012). In this paper, a novel automatic method is proposed for domain knowledge acquisition from domain text corpus by semantic annotating and pattern mining. After pre-processing the corpus by splitting into sentences and removing the noise data, the proposed method analyzes the corpus by using Minipar to label sentences. The nouns and noun phrases, regarded as the concepts, are extracted along with structural patterns. In order to improve matching and learning efficiency, frequent patterns are mined based on these structural patterns using a sequence-based Apriori algorithm. The concepts are then annotated with semantic labels by WordNet and a further defined semantic bank containing unit annotations and context annotations. With regard to training on annotated data, the method can generate structural patterns to match with a group of frequent patterns previously extracted. The concepts and their annotated relations in each pattern can be applied into the sentences with the same matched frequent pattern to derive more concepts with similar relations, or to learn more relations with the same concepts. The automatically extracted concepts, semantic labels and relations can thus dramatically enrich domain knowledge bases. We used Yahoo! Answers (2012) corpus as of 10/25/2007 as an experimental dataset. It contained 4,483,032 questions and their correct answers, which are reliable domain text sources as the answers were rated by human users. Furthermore, the dataset contained a small amount of metadata such as best answers on human selection. Finally, 19,622 questions were extracted with their best answers in the “heart diseases” category as the domain corpus. By proposing a number of knowledge transformation rules, the preliminary experiments showed that the annotations could be transformed to standard knowledge representation formats such as OWL 170 Pertanika J. Sci. & Technol. 21 (1): 283 - 298 (2013)
Toward Automatic Semantic Annotating and Pattern Mining for Domain Knowledge Acquisition accurately and stably. Therefore, the method is feasible for domain knowledge acquisition aiming at knowledge sharing. The rest of this paper is organized as follows: Section 2 introduces the proposed framework of automatic knowledge acquisition. Section 3 presents sentence labelling and pattern mining. Section 4 describes semantic annotating of the concepts and relation learning by pattern matching. In Section 5, preliminary experiments with knowledge transformation rules are discussed, and Section 6 summarizes this paper and discusses future work. THE FRAMEWORK OF DOMAIN KNOWLEDGE ACQUISITION Admittedly, human-based knowledge acquisition can ensure the high quality of extracted knowledge. However, it is a long-term laborious work with high cost. Thus, automatic acquisition methods are more preferable, especially for quick acquisition demand. In this paper, an automatic method was proposed to extract domain knowledge based on semantic annotating and frequent pattern mining. Through this method, domain corpus was firstly pre-processed to separate the text paragraphs into sentences and to remove noisy data such as Mathematics formulas. Sentences were then analyzed and labelled by Minipar to extract nouns, noun phrases and structural patterns. By using the sequence-based Apriori algorithm, the frequent patterns were mined from those extracted structural patterns to improve learning efficiency. Regarding those nouns or noun phrases as concepts, these concepts were then annotated by WordNet and a semantic bank which contained unit annotations and context annotations, as a semantic label annotation technique. After that, the patterns with their conceptual relations, learned from annotated training resources, were matched with the mined frequent patterns. According to the matched parts, the corresponding relations could be applied to the concepts associated with the mined frequent patterns. With defined knowledge transformation rules, these concepts and relations can be further transformed into standard knowledge representation formats such as OWL thus to enrich domain knowledge bases. The related framework is shown Fig.1. The automatic knowledge acquisition method contains seven main steps, which are shown in the fig.1: Step 1: Pre-processing a domain corpus by formatting, removing noisy data, and splitting text paragraphs into sentences; Step 2: Labelling all sentences to extract all nouns, noun phrases, and structural patterns; Step 3: Mining frequent patterns from the structural patterns using a sequence-based frequent pattern mining algorithm; Step 4: Annotating all the concepts with semantic labels based on WordNet and a semantic bank-based annotation method; Step 5: Training annotated resources to match with the frequent patterns; Step 6: Assigning relations extracted from matched patterns to the previously extracted concepts; Step 7: Presenting all the concepts and relations using OWL or RDF, aided by knowledge transformation rules to enrich domain knowledge bases. Pertanika J. Sci. & Technol. 21 (1): 283 - 298 (2013) 171
Page 2 and 3:
Journal of Science & Technology Jou
Page 4 and 5:
International Advisory Board Adarsh
Page 6 and 7:
ut bovine milk allergy by far is th
Page 8 and 9:
Selected Articles from UPM-Malaysia
Page 11 and 12:
ISSN: 0128-7680 © 2013 Universiti
Page 13 and 14:
Aspect of Fatigue Analysis of Compo
Page 15 and 16:
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
Page 25 and 26:
Page 27 and 28:
Application of Anthropometric Dimen
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
Page 39 and 40:
Page 41 and 42:
Different Media Formulation on Bioc
Page 43 and 44:
Page 45:
Page 48 and 49:
Heng, S. C., Ibrahim, Z. B., Suleim
Page 50 and 51:
Page 52 and 53:
Page 54 and 55:
CONCLUSION Heng, S. C., Ibrahim, Z.
Page 56 and 57:
Y. Yusuf, J. M. Juoi, Z. M. Rosli,
Page 58 and 59:
Page 60 and 61:
Page 62 and 63:
Surface Roughness Y. Yusuf, J. M. J
Page 64 and 65:
DISCUSSION Y. Yusuf, J. M. Juoi, Z.
Page 66 and 67:
Page 69 and 70:
Page 71 and 72:
Modelling of Motion Resistance Rati
Page 73 and 74:
The Empirical Method Modelling of M
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Assessment of Heavy Metal Pollution
Page 89 and 90:
TABLE 1: Size of mussels in average
Page 91 and 92:
Page 93 and 94:
TABLE 5: Heavy metal concentrations
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Physico-Chemical and Electrical Pro
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
TABLE 1: (continued) Physico-Chemic
Page 117 and 118:
Page 119:
Page 122 and 123:
Norhashila Hashim et al. effectiven
Page 124 and 125:
Statistical Analysis Norhashila Has
Page 126 and 127:
Norhashila Hashim et al. Post-hoc a
Page 128 and 129:
Norhashila Hashim et al. Cubero, S.
Page 130 and 131: S. Ismail and K. A. Mohd Atan 4 4 p
Page 132 and 133: S. Ismail and K. A. Mohd Atan The f
Page 134 and 135: Lemma 1.1 The equation S. Ismail an
Page 136 and 137: CONCLUSION S. Ismail and K. A. Mohd
Page 138 and 139: INTRODUCTION Tajau, R. et al. A hig
Page 140 and 141: Tajau, R. et al. the acrylate’s c
Page 142 and 143: Tajau, R. et al. double bond) and C
Page 144 and 145: Tajau, R. et al. Ulanski, P., & Ros
Page 146 and 147: Aji, I. S., Zinudin, E. S., Khairul
Page 148 and 149: Aji, I. S., Zinudin, E. S., Khairul
Page 150 and 151: CONCLUSION Aji, I. S., Zinudin, E.
Page 152 and 153: D. Bachtiar, S. M. Sapuan, E. S. Za
Page 156 and 157: TABLE 1: (continue) 4 D. Bachtiar,
Page 162 and 163: N. Maizatul, I. Norazowa, W. M. Z.
Page 168 and 169: REFERENCES N. Maizatul, I. Norazowa
Page 171 and 172: ISSN: 0128-7680 © 2013 Universiti
Page 173 and 174: CBIR Using Colour and Shape Fused F
Page 179: ISSN: 0128-7680 © 2013 Universiti
Page 183 and 184: Toward Automatic Semantic Annotatin
Page 195 and 196: Issues on Trust Management in Wirel
Page 199 and 200: Trust Value MF-ID 1 0.8 0.6 0.4 0.2
Page 205 and 206: A Negation Query Engine for Complex
Page 213 and 214: REFERENCES A Negation Query Engine
Page 217 and 218: The Problem Association Rule Mining
Page 219 and 220: Association Rule Mining threshold t
Page 221 and 222: Association Rule Mining must be sor
Page 223 and 224: Association Rule Mining On the othe
Page 225: Association Rule Mining Quinlan, J.
Page 228 and 229: Che Mustapha Yusuf, J., Mohd Su’u
Page 230 and 231:
Che Mustapha Yusuf, J., Mohd Su’u
Page 232 and 233:
Page 234 and 235:
Page 236 and 237:
Page 238 and 239:
Az Azrinudin Alidin and Fabio Crest
Page 240 and 241:
Page 242 and 243:
Page 244 and 245:
Page 246 and 247:
Page 248 and 249:
Page 250 and 251:
Yogan Jaya Kumar, Naomie Salim, Ahm
Page 252 and 253:
Page 254 and 255:
Page 256 and 257:
REFERENCES Yogan Jaya Kumar, Naomie
Page 258 and 259:
Hasiah Mohamed@Omar, Azizah Jaafar
Page 260 and 261:
Page 262 and 263:
Page 264 and 265:
Page 266 and 267:
Page 268 and 269:
Page 271 and 272:
Page 273 and 274:
The Role of Similarity Measurement
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
TABLE 5: Winners MRS The Role of Si
Page 283:
REFEREES FOR THE PERTANIKA JOURNAL
Page 286 and 287:
Guidelines for Authors Publication
Page 288 and 289:
table should be prepared on a separ
Page 290 and 291:
The Journal’s review process What
Page 293 and 294:
Usability of Educational Computer G
show all

JST Vol. 21 (1) Jan. 2013 - Pertanika Journal - Universiti Putra ...

Create successful ePaper yourself

Delete template?

Save as template?