21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

My present linguistic solution 28 is to opt for the more analytic description of<br />

compound names and to tag some critical words as both PROP and ADJ or N in the<br />

lexicon. Since only underived competing analyses pose a problem (derivationals also<br />

in the new system still receive a tag for the PROP alternative), the list of these names<br />

is quite short - a check on a 1.5-million word chunk of corpus yielded less than 150<br />

different cases (which isn't much compared to the 2% overall frequency of names).<br />

In the appendix section, a list of context sensitive CG disambiguation rules is<br />

given for the disambiguation of words which the analyser has assigned other PoS<br />

tags alongside the proper noun tag. Apart from specific rules, which explicitly target<br />

proper nouns, many other rules may contribute to resolving the ambiguity in an<br />

indirect, cautious way - by eliminating competing PoS readings one by one, leaving<br />

only the desired one.<br />

An important contribution to the proper noun sub-section of CG-rules is the<br />

structural information that follows from the recognition of certain types of name<br />

chains, typical of Portuguese text:<br />

(4a) Felipe Cruz Guimarães<br />

(4b) o presidente Fernando Collor de Mello<br />

a carioca Maria dos Santos<br />

o senhor Aurélio Buarque de Holanda Ferreira<br />

(4c) Hamilton Mello jr.<br />

(4d) o crítico de gastronomia Celso Nucci<br />

(5a) a Guia Quatro Rodas<br />

o Grupo Rui Barreto<br />

(5b) o restaurante Arroz-de-Hausa<br />

(5c) a Grande São Paulo<br />

(5d) Europa Oriental<br />

(6a) a Drake Beam Morin<br />

(6b) o Instituto para Reprodução Humana de Roma<br />

(7a) Massachusetts Institute of Technology<br />

(7b) Guns 'n' Roses<br />

(7c) Michael's Friends<br />

The personal names in (4) can all be described by the pattern:<br />

(4') (N ) PROP+ (de/do/da/dos/das PROP+) (jr./sr./I),<br />

28 In standard mode, the parser today (1999) draws upon a special filter program written to capture likely name chains in<br />

a heuristic way after preprocessing and before morphological analysis, linking name chain elements in the same way<br />

recognized polylexicals are (by ‘=‘ signs). Thus, the capitalised parts of most name chains (plus interfering ‘de’, ‘dos’,<br />

‘von’, ‘van’ etc.) become one-word units to the eyes of PALMORF, ensuring PROP analysis, but hiding most of the<br />

analytical structure of the name unit.<br />

- 45 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!