Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL


You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2.5 Tagging categories: word classes and inflexion tags<br /> Defining word classes morphologically<br />

The parser's tag set contains 14 word class categories, that combine with 24 tags for<br />

inflexion categories, yielding several hundred distinct complex tag lines. Thus, in the<br />

tag-line 'V PR 3S IND VFIN', for example, the word class 'V' alternates with 12<br />

other word classes, and within the V-class 'PR' (present tense) alternates with 5 other<br />

tenses, each of which comes in 6 different shades of person-number combinations,<br />

for both 'IND' (indicative) and 'SUBJ' (subjunctive). This way 6x6x2=72 finite verb<br />

forms can be described by using only 6+6+2=14 "partial" tags. This analytical<br />

character of the tag strings makes them more "transparent", and it also makes things<br />

easier for the disambiguation rules. In contrast to other systems (cp., for example, the<br />

CLAWS-system, as described in Leech, Garside, Bryant, 1994), a clear distinction is<br />

upheld in the tag string between base forms ("words"), word classes and inflexion<br />

categories.<br />

Furthermore, word classes are almost exclusively defined in morphological<br />

terms, thus keeping them apart from the syntactic categories 52 . A noun (N), for<br />

instance, is defined paradigmatically as that word class, which features gender as<br />

(invariant) lexeme category and number as (variable) word form category. The<br />

opposite applies to numerals (NUM), while both gender and number are lexeme<br />

categories for proper nouns (PROP), and word form categories for adjectives (ADJ).<br />

Pronouns can be classified along the same lines, yielding a determiner class<br />

(DET) with the same (variable) categories as adjectives, and a "specifier" class<br />

(SPEC) of "noun-like" pronouns featuring the same (invariant) categories as proper<br />

nouns. Personal pronouns (PERS), a third class, has 4 word form categories: number,<br />

gender, case and person. All three pronoun classes are distinguishable from the<br />

"real" nominal classes by the fact that they do not allow derivation (a typical<br />

characteristic of deictics).<br />

Pronouns like 'o' and 'este', that can appear in both "adjectival" and "nounlike"<br />

position, are in my system unambiguous members of the DET-class, as judged<br />

by the exclusively morphological criterion of inflexional variability with regard to<br />

number and gender. The article class doesn't receive special treatment either: 'o' is<br />

always 53 DET, whether used as "article", "adjectival demonstrative" or "noun-like<br />

demonstrative". (Secondary) tags for and do appear in the tag list, but<br />

they are not word class categories, and are therefore only disambiguated at a later<br />

stage (the valency level of CG), for use in the MT module.<br />

Among participles, the word class world's enfants terribles, only the past (or<br />

perfective) participle (V PCP) is inflexionally productive in Portuguese, and I treat<br />

52 I owe the urge to define word classes as morphologically as possible to Hans Arndt, who advocates a strict distinction<br />

between decontextually defined (primary) tags and distributionally defined syntactic tags in corpus annotation,<br />

suggesting category inventory as a means of word class definition in Danish (Arndt, 1992).<br />

53 that is, if it is not the personal object pronoun 'o' or the letter name 'o', or the chemical abbreviation 'O'.<br />

- 68 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!