21.04.2013 Views

Eckhard Bick - VISL

Eckhard Bick - VISL

Eckhard Bick - VISL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

first, - but soon, it would move the bricks in unpredictable ways, it would be the<br />

sentient being, thinking, reacting, surprising you.<br />

This is what has fascinated me ever since I made my school’s Wang play<br />

checkers. With my projects evolving from the unprofessionally naive to the<br />

unprofessionally experimental, I programmed creativity by filtering random input for<br />

patterns and symmetry, I made my own Eliza, I built self-learning teaching tools, and<br />

I tried to make a computer translate. I was thrilled by the idea of a perfect memory in<br />

my digital student, the instantaneous dictionary, by never having to learn a piece of<br />

information twice.<br />

Along the way things became somewhat less unprofessional, and I<br />

accumulated some experience with NLP, constructing machine-readable dictionaries<br />

for Danish, Esperanto and Portuguese, and – in 1986 – a morphological analyser and<br />

MT-program for Danish 2 . Then – in 1994 – I heard a highly contagious lecture by<br />

Fred Karlsson presenting his Constraint Grammar formalism for context based<br />

disambiguation of morphological and syntactic ambiguities. I was fascinated both by<br />

the robustness of the English Constraint Grammar (Karlsson et. al., 1991) and its<br />

word based notational system of tags integrating both morphology and flat<br />

dependency syntax in a way that allowed easy handling by a computer’s text<br />

processing tools. It was not clear at the time (and still is not) up to which level of<br />

syntactic or even semantic analysis Constraint Grammar can be made to work, and it<br />

had never – at any larger scale – been applied to Romance languages. So I decided to<br />

try it out on Portuguese 3 , working upwards from morphology to syntax and<br />

semantics, in the framework of a Ph.D. project in Computer Linguistics. The goal<br />

was the automatic analysis of free running Portuguese text, i.e. to build a computer<br />

program (a morphological tagger and a syntactic parser) that would take an ordinary<br />

text file - typed, mailed or scanned - as input and produce grammatically analysed<br />

output as unambiguous and error-free as possible. My ultimate motivation, the<br />

raison d’être of my digital child, has always been applicational – encompassing the<br />

production of research corpora 4 , communication and teaching tools, information<br />

handling and, ultimately, machine translation. But in the process of making the<br />

digital toddler walk, I would have to fight and tame the Beast , as my supervisor<br />

Hans Arndt called it, the ever-changing and multi-faceted creation which is human<br />

language. I would have to chart the lexical landscape of Portuguese, to define the<br />

categories and structures I would ask my parser to recognise, and to check both<br />

tradition, introspection and grammatical intuition against raw and real corpus data.<br />

Many times, this process has turned back on itself, with the dynamics of the ”tool<br />

grammar” (i.e. the growing Constraint Grammar rule set) forcing new distinctions or<br />

2 This system - “Danmorf” - has been revived in 1999, to become the morphological kernel of the Danish “free text”<br />

section of the <strong>VISL</strong>-project at Odense University, and can be visited at http://visl.hum.sdu.dk.<br />

3 Romance languages, with the possible exception of French, share much of their syntactic structure, and also most<br />

morphological categories. Even many lexical items, not least pronouns and conjunctions, can often be matched one-onone<br />

across languages. At the time of writing (1999), I have begun to adapt my Portuguese Constraint Grammar for<br />

Spanish, with encouraging results (http://visl.hum.sdu.dk).<br />

4 The largest annotation task so far, completed in november 1999, has been tha annotation of a 90 million word corpus<br />

of Brazilian Portuguese, for a research group at the Catholic University of São Paulo.<br />

- 9 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!