21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2.5 Tagging categories: word classes and inflexion tags<br />

2.2.5.1 Defining word classes morphologically<br />

The parser's tag set contains 14 word class categories, that combine with 24 tags for<br />

inflexion categories, yielding several hundred distinct complex tag lines. Thus, in the<br />

tag-line 'V PR 3S IND VFIN', for example, the word class 'V' alternates with 12<br />

other word classes, and within the V-class 'PR' (present tense) alternates with 5 other<br />

tenses, each of which comes in 6 different shades of person-number combinations,<br />

for both 'IND' (indicative) and 'SUBJ' (subjunctive). This way 6x6x2=72 finite verb<br />

forms can be described by using only 6+6+2=14 "partial" tags. This analytical<br />

character of the tag strings makes them more "transparent", and it also makes things<br />

easier for the disambiguation rules. In contrast to other systems (cp., for example, the<br />

CLAWS-system, as described in Leech, Garside, Bryant, 1994), a clear distinction is<br />

upheld in the tag string between base forms ("words"), word classes and inflexion<br />

categories.<br />

Furthermore, word classes are almost exclusively defined in morphological<br />

terms, thus keeping them apart from the syntactic categories 52 . A noun (N), for<br />

instance, is defined paradigmatically as that word class, which features gender as<br />

(invariant) lexeme category and number as (variable) word form category. The<br />

opposite applies to numerals (NUM), while both gender and number are lexeme<br />

categories for proper nouns (PROP), and word form categories for adjectives (ADJ).<br />

Pronouns can be classified along the same lines, yielding a determiner class<br />

(DET) with the same (variable) categories as adjectives, and a "specifier" class<br />

(SPEC) of "noun-like" pronouns featuring the same (invariant) categories as proper<br />

nouns. Personal pronouns (PERS), a third class, has 4 word form categories: number,<br />

gender, case and person. All three pronoun classes are distinguishable from the<br />

"real" nominal classes by the fact that they do not allow derivation (a typical<br />

characteristic of deictics).<br />

Pronouns like 'o' and 'este', that can appear in both "adjectival" and "nounlike"<br />

position, are in my system unambiguous members of the DET-class, as judged<br />

by the exclusively morphological criterion of inflexional variability with regard to<br />

number and gender. The article class doesn't receive special treatment either: 'o' is<br />

always 53 DET, whether used as "article", "adjectival demonstrative" or "noun-like<br />

demonstrative". (Secondary) tags for and do appear in the tag list, but<br />

they are not word class categories, and are therefore only disambiguated at a later<br />

stage (the valency level of CG), for use in the MT module.<br />

Among participles, the word class world's enfants terribles, only the past (or<br />

perfective) participle (V PCP) is inflexionally productive in Portuguese, and I treat<br />

52 I owe the urge to define word classes as morphologically as possible to Hans Arndt, who advocates a strict distinction<br />

between decontextually defined (primary) tags and distributionally defined syntactic tags in corpus annotation,<br />

suggesting category inventory as a means of word class definition in Danish (Arndt, 1992).<br />

53 that is, if it is not the personal object pronoun 'o' or the letter name 'o', or the chemical abbreviation 'O'.<br />

- 68 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!