12.11.2014 Views

Paper

Paper

Paper

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Automated Filipino Verbal Sentence Evaluator<br />

Jennefer B. Jore<br />

Associate Software Engineer<br />

Cybergate 1, Robinson’s Pioneer,<br />

Boni Ave., Manadaluyong<br />

Philippines<br />

jennefer.b.jore<br />

@yaccenture.com<br />

Ana Ruby B. Ramos<br />

Associate Software Engineer<br />

Cybergate 1, Robinson’s Pioneer,<br />

Boni Ave., Manadaluyong<br />

Philippines<br />

mam.ana.ruby.r.cordero<br />

@accenture.com<br />

Qurrata-Ayn K. Karim<br />

3rd author's affiliation<br />

1st line of address<br />

2nd line of address<br />

Telephone number, incl. country code<br />

Ayn_karim@yahoo.com<br />

Erlyn Q. Maguilimotan<br />

Faculty,<br />

Computer Science Dept.<br />

College of Science and Information<br />

Technology<br />

Ateneo De Zamboanga University<br />

erlynqm@yahoo.com<br />

Ebony C. Domingo<br />

Chairperson,<br />

Computer Science Dept.<br />

College of Science and Information<br />

Technology<br />

Ateneo De Zamboanga University<br />

domingoeboc@yahoo.com<br />

ABSTRACT<br />

Grammar acquisition is an important part of language acquisition<br />

and learning for human beings. Many projects have been designed<br />

to assist in the grammar development of people by having<br />

automated checking of grammars both for fixed word order and<br />

free word order languages. The Filipino language is a free word<br />

order language. It exhibits the problem of discontinuous<br />

constituents. Several approaches used to treat this problem use a<br />

hierarchical syntactic structure that resulted to parsing and<br />

processing delays. One approach that also treats this problem<br />

called Tagalog Free-Word Order (TagFWO) Parser uses a flat<br />

syntactic structure. This approach is able to solve the problem of<br />

discontinuous constituents syntactically. However, the semantic<br />

side is not treated by this approach. The aim of this research then<br />

is to develop a system that evaluates a Filipino verbal sentence by<br />

checking the syntactic structure and semantic relation of the<br />

constituents of the sentence.<br />

The Automated Filipino Verbal Sentence Evaluator is a system<br />

capable of evaluating Filipino verbal sentences based on its<br />

grammar. It uses a Parser in checking the grammar structure, and<br />

the Lexical Functional Grammar (LFG) formalism for the<br />

grammar relation. Grammar structure takes into account the<br />

syntax of the sentence by verifying if such sentence structure<br />

valid in the system. Grammar relation considers the functional<br />

relationship of each constituent in the sentence by checking if the<br />

doer in the sentence has the capability to do such action.<br />

The system is trained on a set of Filipino 33 verbal and nonverbal<br />

sentences (grammatical and ungrammatical). The results<br />

showed that the grammatical verbal sentences were all evaluated<br />

properly with their corresponding detailed user evaluation<br />

feedback. The grammatical non-verbal and ungrammatical<br />

sentences are rejected and outputted a corresponding error<br />

message.<br />

The method developed in this research has resolved issues on<br />

syntactic and semantic relations in tagalong verbal sentences.<br />

However the issues of lexical ambiguities and deeper semantic<br />

interpretations have not yet been included in this research. This<br />

study can further be enhanced to embrace a more complex verbal<br />

system of the Filipino language considering other parts of speech.<br />

General Terms<br />

Algorithms, Languages.<br />

Keywords<br />

Grammar checker, Filipino, natural language processing, artificial<br />

intelligence, text processing.<br />

1. INTRODUCTION<br />

Language systems consist of words arranged in certain learned<br />

ways (grammar and syntax). Internationalization of language<br />

systems is developed through recognition of syntactic structures<br />

or grammar of a language. Syntactic analysis is the process of<br />

determining the syntactic structure of a sentence according to<br />

grammar rules. This analysis is vital for the recognition of the<br />

grammatical correctness of a sentence [14].<br />

Syntactic analysis is subdivided into structure and relation. One<br />

major application of structural syntactic analysis is parsing. This<br />

method is the decomposition of scanned tokens in an input stream<br />

(a sentence in a language) into components based on phrase<br />

structure grammar rules. Grammar is defined as a system of rules<br />

and principles that determine the formal, legal and semantic<br />

properties of sentences [5] and the description of the signals<br />

which lead to the understanding of a language. Most studies<br />

conducted in the field of syntactic analysis are concentrated on<br />

parsing algorithms.<br />

Parsing algorithms are given higher priority than grammatical<br />

relation is because researchers in this field are seeking a universal<br />

model on syntax for both free and fixed word order languages.<br />

53


However, fixed word order languages are considered in most<br />

investigations [9].<br />

Fixed word order languages are languages that have a strict<br />

ordering of constituents [3] and is said to be configurational,<br />

while free word order languages do not follow any rule for the<br />

ordering of the constituents and is said to be non-configurational.<br />

Non-configurational means the verb, as the head of the sentence<br />

structure, along with the other constituents in the structure can be<br />

treated as sisters [7]. In a configurational setting the verb and<br />

other constituents cannot be treated as sisters. A separate verb<br />

node is required. Current approaches on free word order<br />

languages are based the configurational approach and thus,<br />

resulting to problems of capturing discontinuous constituents<br />

which are present in free word order languages [3].<br />

Current treatment to this problem is already available. One<br />

approach to this problem is scrambling approach which involves<br />

transformations of a constituent from its original position to other<br />

positions until the right position is found [3]. However, this<br />

approach creates parsing delays due to searching of the adjacent<br />

constituents. Another approach is the sortal hierarchy of types,<br />

which was modeled using the German language [11].<br />

Unfortunately, this approach cannot be applied to Filipino<br />

because it is unsuitable for representing adjacent constituents.<br />

Another approach is the discontinuous dependency parsing which<br />

was applied to Russian and Latin [1]. This approach is applicable<br />

to other languages; however, it is time consuming. It backtracks<br />

and finds an alternative solution thus, exhibits non-determinism<br />

due to the lack of a predictive capability [3].<br />

One research in the Philippines on syntactic analysis is called<br />

Tagalog Free Word Order (TagFWO) Parser by Editha D.<br />

Dimalen [3]. TagFWO Parser is a web-based implementation of a<br />

new technique to address the problem of discontinuous<br />

constituents in a free word order language, Tagalog. It uses flat<br />

syntactic structure that differs from the current approaches that<br />

uses a hierarchical syntactic structure. It uses the concept of Head<br />

Specifier and Head Complement rules to handle the constituency<br />

of tagalong language. It is appropriate for Tagalog language and<br />

require less computing time in contrast to other existing approach.<br />

However, the above-mentioned approaches are focused on the<br />

syntactic structure of the sentences and less on the grammatical<br />

relations. A study by Kroeger [7] showed the insufficiency of<br />

phrase structure rules to capture the syntactic relations and the<br />

importance played by grammatical relations for the Filipino<br />

language. Filipino is the national language of the Philippines. This<br />

language is characterized to be non-configurational. As nonconfigurational,<br />

Filipino does not follow fixed ordering of words<br />

in sentence constructions. Thus, phrase structure rules are not<br />

considered to be sufficient to address the non-configurationality<br />

of the language.<br />

Syntactic relationships and grammatical relations in Filipino are<br />

signified by case markings and verbal affixations [8]. These<br />

syntactic attributes contribute working out what is to be means in<br />

a sentence. The affixations in the Filipino language signify<br />

semantic criteria and categories. Phrasal structures do not succeed<br />

in understanding lexical structures in words but only<br />

componential functions within phrases [6].<br />

Grammar formalism is needed in order to capture syntactic<br />

relationships and grammatical relations of each constituent in a<br />

sentence in any natural language like Filipino. The Lexical<br />

Functional Grammar (LFG) is able to capture both of these<br />

syntactic attributes. Dimalen [6] made use of Head-driven Phrase<br />

Structure Grammar (HPSG) formalism. However, according to<br />

the author, LFG is simpler while retaining the same capabilities of<br />

HPSG. This research then developed an automated grammar<br />

checker for Filipino verbal sentences, which used LFG grammar<br />

formalism.<br />

2. FILIPINO VERBAL SENTENCE<br />

Filipino verbal sentences are sentences that contain a verb or verb<br />

form in the predicate position. The verbal form of the predicate<br />

determines the role of the noun(s) in the sentence. This depends<br />

on the affix in the verb which tell whether the noun is being an<br />

actor, object, instrument, etc.<br />

One interesting feature of the Filipino language is its focus<br />

system. This means that the role of the noun in focus is reflected<br />

in the verb. Focus is the feature of a verbal predicate that<br />

determines the semantic relationship between a predicate verb and<br />

its topic [12]. There are two types of focus that occur on a basic<br />

Filipino sentence: Actor-focus, the focus is on the actor or doer,<br />

and Goal-focus, does not focus on the actor. There are different<br />

classes of goal-focus. However, Schachter and Otanes [12]<br />

pointed out that only two from these classes are found in basic<br />

Filipino sentence: Object focus, and Directional-focus. The use of<br />

this different focus is based on their affixes.<br />

The verb is based on the use of affixes. The affix is a way of<br />

packaging in some extra information into a word. Filipino uses<br />

affixes in a similar way to indicate tenses of a verb, if an action is<br />

completed or not. In addition to this, Filipino uses affixes to<br />

indicate the role of the focus of the sentence. In other words,<br />

affixes are used to determine what the focus is doing in the<br />

sentence.<br />

2.1 LFG as a grammar checker<br />

A grammar checker was developed to address the problem on<br />

word order, subject-verb agreement and pragmatically in correct<br />

constituent orders of German sentences. This project made use of<br />

LFG and supplemented with rule components for analysis of<br />

ungrammatical input. LFG is composed of constituent-structure<br />

containing the linear hierarchical constituent order and functional<br />

structure representing functional relations and grammatical<br />

features by means of attribute value matrices. Having rule-based<br />

grammar checker with LFG, this project was able to parse<br />

unrestricted input and identify correct errors. However,<br />

orthography and morphological error identified are still<br />

unresolved [4].<br />

LFG has two structures for representing different levels of<br />

linguistics information: constituent structure (c-structure) and the<br />

functional structure (f-structure). The c-structure in LFG<br />

represents the external structure of a sentence in the form of a<br />

phrase structure tree [15]. It shows the syntactic constituents of<br />

the sentence. It relies on the grammar rules defined by the LFG. It<br />

is the more concrete level of linear and hierarchical organization<br />

54


of words into phrases [2]. It contains lexical and functional<br />

categories. A sample c-structure is shown in Figure 1 applying<br />

phrase structure rules for the sentence “natulog ang bata” .<br />

checks for the capability of the doer to do the task which is the<br />

verb. It checks the lexical entry of the verb if such object is<br />

accepted to it. It also checks the relationship between the two<br />

nouns through the verb. Since the verb accepts an object and the<br />

doer has the capability to do the action based from the lexical<br />

entry, then, f-structure considers this sentence as grammatically<br />

correct.<br />

Figure1. Sample c-structure with Functional Schemata<br />

The functional schemata ( SUBJ) = and = show in symbols<br />

the role of each string play in a sentence (Mangulimotan, 2001). f-<br />

structure does not have direct mapping from cstructure. It is<br />

constructed from instantiation. Thus, the arrows symbols assume<br />

referential values that point to their values () and to which<br />

immediately dominates them () [10].<br />

3. SYSTEM WORKFLOW<br />

The overall flow of the system is shown in figure 3. An input<br />

sentence is passed on to the Lexical Analyzer module. There are<br />

three applications that process the sentence in this module. The<br />

first application is called Tokenization which separates each word<br />

of the sentence as a unique entity called token. Once tokenized,<br />

the first token which should be the verb, is passed on to the<br />

second application called Word Stemming. This application<br />

determines the root word of the verb by extracting the affixes. At<br />

the same time, it checks the validity of the root word form using<br />

the lexicon. Based from the extracted affix, the focus type of the<br />

sentence can be determined [13]. The remaining tokens are also<br />

checked if such word exists in the lexicon. The final application<br />

for this module is Tagging. Each token is tagged with the proper<br />

part-of-speech tags which are passed on to the parser.<br />

The f-structure models the internal structure of a language and the<br />

functional roles of each constituent or word order in producing<br />

the meaning of the sentence [2].Each word is designated a set of<br />

categories like subject, object, topic, focus, aspect, case, number,<br />

gender, and other important lexical attributes. This is how f-structure<br />

checks the grammaticality sentence “bumili ang bata ng isda” ( Figure 2).<br />

Figure 2. Sample f-structure<br />

The verb Bumili is considered to be in actor focus since it has the<br />

affix um and thus, making the actor as the subject of the sentence.<br />

The determiner ang determines the subject. The noun bata which<br />

is preceded by the determiner ang and the focus signifies the term<br />

as the subject. Thus from the relationship alone of these three<br />

constituents, the f-structure can immediately identify the subject.<br />

The verb Bumili is an actor focus for it has the affix um. As a rule,<br />

actor focus requires an actor to make the sentence complete. The<br />

object is an optional in the sentence. However, in this sentence, an<br />

object phrase is included. To check if the phrase is an object, a<br />

determiner ng is checked after the subject. The noun isda which is<br />

preceded by ng signifies the term as the object. f-structure does<br />

not only rely on checking the subject and doer rather it also<br />

Figure 3. Architectural Design<br />

The parser verifies the grammar structure through the grammarrule<br />

specified in the system. It is the syntactic structure that is<br />

evaluated by the parser first through the grammar syntax rules<br />

provided by the system. The semantic side is evaluated by Lexical<br />

Functional Grammar (LFG).<br />

LFG evaluates the semantic of the sentence by means of<br />

grammatical relations. Each word in the sentence has their<br />

respective lexical information defined in the lexicon. After LFG<br />

evaluates, the systems outputs a user feedback that states the<br />

evaluation process of the system.<br />

4. RESULTS AND DISCUSSION<br />

Filipino verbal sentence is the main study of this research. The<br />

following rules that were adopted in different Balarilang Filipino<br />

55


ooks were used as a basis for determining grammatically correct<br />

and wrong Filipino verbal sentences.<br />

Figure 4. Grammar Rules<br />

This research initially made use of 16 verbs and 38 nouns chosen<br />

randomly from the Handbook of Tagalog Verbs by Teresita V.<br />

Ramos [12]. These were made part of the lexicon. The system<br />

was tested and evaluated using different Filipino verbal and nonverbal<br />

sentences. There were seven (7) grammatically correct<br />

Filipino verbal sentences that was successfully evaluated by the<br />

system. Taking all the possible orderings of the 7 sentences, it<br />

resulted to thirty-three (33) combinations in all due to free-word<br />

ordering. The system has been able to evaluate the sample<br />

sentences. Grammatically correct verbal sentences were<br />

acknowledged with a detailed evaluation as an output of the<br />

system while grammatically wrong sentences were also<br />

acknowledged and given with the necessary information for being<br />

incorrect.<br />

5. RECOMMENDATIONS<br />

The Automated Filipino Verbal Sentence Evaluator has resolved<br />

the issues on syntactic and semantic relations. However, the<br />

issues of lexical ambiguities and deeper semantic interpretations<br />

have not yet been included in this research. But, with LFG’s<br />

ability of employing semantic relation rules, it is possible to<br />

resolve the issues on lexical ambiguities and deeper semantic<br />

interpretations. However, this requires changes in the semantic<br />

rule and is subject to further investigations.<br />

This study can further be enhanced to embrace a more complex<br />

verbal system of the Filipino language. Other Filipino parts of<br />

speech may also be considered as an additional scope to the study.<br />

In line with this, an automated Filipino essay evaluator can be<br />

developed through this advance studies.<br />

Kroeger [7] has said that Philippine-type languages exhibit<br />

structural similarities. This means that it is possible for the system<br />

to be also used for other Philippines languages and requires only<br />

additional entries in the lexicon. Moreover, this research has made<br />

a very significant contribution in the field of Natural Language<br />

Processing especially in the different researches and studies<br />

conducted for the Filipino language.<br />

6. REFERENCES<br />

[1] Covington, M.Discontinuous Dependency Parsing of Free<br />

and Fixed Word Order. Available:<br />

http://www.ai.uga.edu/ftplib/ai_reports/reports.txt, 1994.<br />

[2] Dalrymple,M. A Lexical Functional Grammar. Available :<br />

http://users.ox.ac.uk/~cpgl0015/lfg.pdf, 2001.<br />

[3] Dimalen, E. Algorithm for Consituent Structures of Tagalog.<br />

MS Thesis, De Lasalle University Professional Schools, Inc.<br />

Manila, Philippines, 2003.<br />

[4] Fortmann, C.and Frost, M. An LFG Grammar Checker for<br />

CALL. Available: ftp://www.ims.uniuttgart.de/pub/Users/forst/Fortmann:Forst-ICALL04.pdf<br />

[5] Fries, P.The 31st International Systematic Functional<br />

Congress. Doshisha University,Kyoto, Japan.<br />

vailable:http://www1.doshisha.ac.jp/~mtatsuki/ISFC31/pages<br />

/abstract_plenary.pdf, 2004.<br />

[6] Hoopman, H., Sportiche, D. and Stabler, E.. An Introduction<br />

to Syntactic Analysis and Theory. Available:<br />

http://www.linguistics.ucla.edu/people/sportiche/isat.pdf,<br />

2002.<br />

[7] Kroeger, P..Phrase Structure and Grammatical Relations in<br />

Tagalog.Dissertations in Linguistics. Stanford, CA: Center<br />

for the Study of Language and Information.xiv,240p, 1993.<br />

[8] Lupyan, G. Modelling Syntactic Devices: An Explanation of<br />

Language Evolution from Connectinist and Memetic<br />

Perspectives.<br />

Available:http://www.isr/uiuc.edu/~amag/langev/paper/lupya<br />

n02modeling.html, 2002.<br />

[9] Maegard, B.Machine Translation.<br />

Available:http://www.cs.uregina.ca/Research.Techreports/95<br />

09.ps, 2002.<br />

[10] Manguilimotan, E.(2001). Syntactic Representation of<br />

Tausug Verbal Sentences. MS Thesis, MSU-Iligan Institute<br />

of Technology, Iligan City, Philippines, 2001.<br />

[11] Oliva, K.The Proper Treatment of Word order in HPSG.In<br />

the Proceedings of the 14 th International Conference on<br />

Computational Linguistics, Nantes.<br />

Available:http//www.acl.ldc.upenn.edu/C/C92?c92-<br />

1031.pdf, 1992.<br />

[12] Ramos, T.Handbook of Tagalog Verbs. University of Hawaii<br />

Press, 320 pp., 1986.<br />

[13] Schachter, P. & Otanes, F. Tagalog Grammar Reference.<br />

University of California Press. Berkeley, CA, 1972.<br />

[14] Tablante, N..The Predictive Value of Knowledge in<br />

Grammar in the Writing Proficiency of the Freshmen<br />

Engineering Students, 1997.<br />

[15] Wong, S.(2001). Lexical Functional Grammar.<br />

Available:<br />

http://www.fi.muni.cz/usr/wong/teaching/mt/notes/node15.html.is<br />

o-8859-1, 2001.<br />

56

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!