26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

51<br />

This procedure becomes extremely labor-<str<strong>on</strong>g>in</str<strong>on</strong>g>tensive for small words that overlap with other<br />

comm<strong>on</strong> words (Lee 2008).<br />

The Greek morphological parser for the PDL (named Morpheus) has been <str<strong>on</strong>g>in</str<strong>on</strong>g> development s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce 1990<br />

<strong>and</strong> was developed by Gregory Crane (Crane 1991). 149 Crane worked with a database of 40,000 stems,<br />

13,000 <str<strong>on</strong>g>in</str<strong>on</strong>g>flecti<strong>on</strong>s, <strong>and</strong> 2,500 irregular forms. In 1991, Morpheus had been used to analyze almost 3<br />

milli<strong>on</strong> words with texts that ranged <str<strong>on</strong>g>in</str<strong>on</strong>g> data from the eighth century BC until the sec<strong>on</strong>d century AD.<br />

S<str<strong>on</strong>g>in</str<strong>on</strong>g>ce then, Morpheus has played an <str<strong>on</strong>g>in</str<strong>on</strong>g>tegral part of the <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e PDL, which has exp<strong>and</strong>ed to cover over<br />

8 milli<strong>on</strong> words <str<strong>on</strong>g>in</str<strong>on</strong>g> Greek. Crane argued that the parser was developed not just to address problems <str<strong>on</strong>g>in</str<strong>on</strong>g><br />

Ancient Greek but also to serve as a possible approach to develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g morphological tools for ancient<br />

languages.<br />

More-recent work <str<strong>on</strong>g>in</str<strong>on</strong>g> automatic morphological analysis of Greek has utilized Morpheus as well as other<br />

resources available from Perseus. Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g (2009) have discussed their implementati<strong>on</strong> of<br />

Greek morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g over the Perseus Greek corpus that made use of two disambiguated<br />

Greek corpora, the open-source part-of-speech analyzer TreeTagger 150 <strong>and</strong> output from Morpheus. The<br />

backb<strong>on</strong>e of their implementati<strong>on</strong> is a SQLite database backend c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g tokens <strong>and</strong> parses for the<br />

full corpus that c<strong>on</strong>nects the three ma<str<strong>on</strong>g>in</str<strong>on</strong>g> comp<strong>on</strong>ents: the Perseus XML files with unique token IDs;<br />

TreeTagger, “which accepts token sequences from the database <strong>and</strong> outputs parses <strong>and</strong> probability<br />

weights, which are stored <str<strong>on</strong>g>in</str<strong>on</strong>g> their own table”; <strong>and</strong> PhiloLogic. 151 Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g,<br />

their system made use of PhiloLogic, because:<br />

… it serves as a highly efficient search <strong>and</strong> retrieval fr<strong>on</strong>t end, by <str<strong>on</strong>g>in</str<strong>on</strong>g>dex<str<strong>on</strong>g>in</str<strong>on</strong>g>g the augmented<br />

XML files as well as the c<strong>on</strong>tents of the l<str<strong>on</strong>g>in</str<strong>on</strong>g>ked SQLite tables. PhiloLogic’s highly optimized<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>dex architecture allows near-<str<strong>on</strong>g>in</str<strong>on</strong>g>stantaneous results <strong>on</strong> complex <str<strong>on</strong>g>in</str<strong>on</strong>g>quiries such as ‘any<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>f<str<strong>on</strong>g>in</str<strong>on</strong>g>itive forms with<str<strong>on</strong>g>in</str<strong>on</strong>g> 25 words of (dative s<str<strong>on</strong>g>in</str<strong>on</strong>g>gulars of) lemma X <strong>and</strong> str<str<strong>on</strong>g>in</str<strong>on</strong>g>g Y’, which would<br />

be a challenge for typical relati<strong>on</strong>al database systems (Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g 2009).<br />

The results of their work are available at “Perseus Under PhiloLogic,” 152 a website that supports<br />

morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g of both the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greek texts of Perseus. Although Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

noted that they were c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>u<str<strong>on</strong>g>in</str<strong>on</strong>g>g to explore the possibilities of natural language search<str<strong>on</strong>g>in</str<strong>on</strong>g>g aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st the<br />

Greek corpus <str<strong>on</strong>g>in</str<strong>on</strong>g> place of the very technical ways supported through PhiloLogic, their system<br />

n<strong>on</strong>etheless supports full morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, str<str<strong>on</strong>g>in</str<strong>on</strong>g>g search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> lemmatized search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong><br />

these features have been <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated <str<strong>on</strong>g>in</str<strong>on</strong>g>to a read<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> brows<str<strong>on</strong>g>in</str<strong>on</strong>g>g envir<strong>on</strong>ment for the texts.<br />

Some other recent research has focused <strong>on</strong> the use of mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e learn<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> large, unlabeled corpora to<br />

perform automatic morphological analysis <strong>on</strong> classical Greek. Lee (2008) has developed an analyzer of<br />

Ancient Greek that “<str<strong>on</strong>g>in</str<strong>on</strong>g>fers the root form of a word” <strong>and</strong> has made two major <str<strong>on</strong>g>in</str<strong>on</strong>g>novati<strong>on</strong>s over previous<br />

systems:<br />

149 Far earlier but also highly significant work <strong>on</strong> the development of a morphological parser for Ancient Greek was c<strong>on</strong>ducted by David Packard <str<strong>on</strong>g>in</str<strong>on</strong>g> the<br />

1970s (Packard 1973).<br />

150 http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/<br />

151 PhiloLogic (http://www.lib.uchicago.edu/efts/ARTFL/philologic/) is a software tool that has been developed by the Project for American <strong>and</strong> French<br />

Research <strong>on</strong> the Treasury of the French Language at the University of Chicago, <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> “its simplest form serves as a document retrieval or look up<br />

mechanism whereby users can search a relati<strong>on</strong>al database to retrieve given documents <strong>and</strong>, <str<strong>on</strong>g>in</str<strong>on</strong>g> some implementati<strong>on</strong>s, porti<strong>on</strong>s of texts such as acts,<br />

scenes, articles, or head-words.”<br />

152 http://perseus.uchicago.edu/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!