Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
Rome Wasn't Digitized in a Day - Council on Library and Information ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
51<br />
This procedure becomes extremely labor-<str<strong>on</strong>g>in</str<strong>on</strong>g>tensive for small words that overlap with other<br />
comm<strong>on</strong> words (Lee 2008).<br />
The Greek morphological parser for the PDL (named Morpheus) has been <str<strong>on</strong>g>in</str<strong>on</strong>g> development s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce 1990<br />
<strong>and</strong> was developed by Gregory Crane (Crane 1991). 149 Crane worked with a database of 40,000 stems,<br />
13,000 <str<strong>on</strong>g>in</str<strong>on</strong>g>flecti<strong>on</strong>s, <strong>and</strong> 2,500 irregular forms. In 1991, Morpheus had been used to analyze almost 3<br />
milli<strong>on</strong> words with texts that ranged <str<strong>on</strong>g>in</str<strong>on</strong>g> data from the eighth century BC until the sec<strong>on</strong>d century AD.<br />
S<str<strong>on</strong>g>in</str<strong>on</strong>g>ce then, Morpheus has played an <str<strong>on</strong>g>in</str<strong>on</strong>g>tegral part of the <strong>on</strong>l<str<strong>on</strong>g>in</str<strong>on</strong>g>e PDL, which has exp<strong>and</strong>ed to cover over<br />
8 milli<strong>on</strong> words <str<strong>on</strong>g>in</str<strong>on</strong>g> Greek. Crane argued that the parser was developed not just to address problems <str<strong>on</strong>g>in</str<strong>on</strong>g><br />
Ancient Greek but also to serve as a possible approach to develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g morphological tools for ancient<br />
languages.<br />
More-recent work <str<strong>on</strong>g>in</str<strong>on</strong>g> automatic morphological analysis of Greek has utilized Morpheus as well as other<br />
resources available from Perseus. Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g (2009) have discussed their implementati<strong>on</strong> of<br />
Greek morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g over the Perseus Greek corpus that made use of two disambiguated<br />
Greek corpora, the open-source part-of-speech analyzer TreeTagger 150 <strong>and</strong> output from Morpheus. The<br />
backb<strong>on</strong>e of their implementati<strong>on</strong> is a SQLite database backend c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g><str<strong>on</strong>g>in</str<strong>on</strong>g>g tokens <strong>and</strong> parses for the<br />
full corpus that c<strong>on</strong>nects the three ma<str<strong>on</strong>g>in</str<strong>on</strong>g> comp<strong>on</strong>ents: the Perseus XML files with unique token IDs;<br />
TreeTagger, “which accepts token sequences from the database <strong>and</strong> outputs parses <strong>and</strong> probability<br />
weights, which are stored <str<strong>on</strong>g>in</str<strong>on</strong>g> their own table”; <strong>and</strong> PhiloLogic. 151 Accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to the Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g,<br />
their system made use of PhiloLogic, because:<br />
… it serves as a highly efficient search <strong>and</strong> retrieval fr<strong>on</strong>t end, by <str<strong>on</strong>g>in</str<strong>on</strong>g>dex<str<strong>on</strong>g>in</str<strong>on</strong>g>g the augmented<br />
XML files as well as the c<strong>on</strong>tents of the l<str<strong>on</strong>g>in</str<strong>on</strong>g>ked SQLite tables. PhiloLogic’s highly optimized<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>dex architecture allows near-<str<strong>on</strong>g>in</str<strong>on</strong>g>stantaneous results <strong>on</strong> complex <str<strong>on</strong>g>in</str<strong>on</strong>g>quiries such as ‘any<br />
<str<strong>on</strong>g>in</str<strong>on</strong>g>f<str<strong>on</strong>g>in</str<strong>on</strong>g>itive forms with<str<strong>on</strong>g>in</str<strong>on</strong>g> 25 words of (dative s<str<strong>on</strong>g>in</str<strong>on</strong>g>gulars of) lemma X <strong>and</strong> str<str<strong>on</strong>g>in</str<strong>on</strong>g>g Y’, which would<br />
be a challenge for typical relati<strong>on</strong>al database systems (Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g 2009).<br />
The results of their work are available at “Perseus Under PhiloLogic,” 152 a website that supports<br />
morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g of both the Lat<str<strong>on</strong>g>in</str<strong>on</strong>g> <strong>and</strong> Greek texts of Perseus. Although Dik <strong>and</strong> Whal<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />
noted that they were c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>u<str<strong>on</strong>g>in</str<strong>on</strong>g>g to explore the possibilities of natural language search<str<strong>on</strong>g>in</str<strong>on</strong>g>g aga<str<strong>on</strong>g>in</str<strong>on</strong>g>st the<br />
Greek corpus <str<strong>on</strong>g>in</str<strong>on</strong>g> place of the very technical ways supported through PhiloLogic, their system<br />
n<strong>on</strong>etheless supports full morphological search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, str<str<strong>on</strong>g>in</str<strong>on</strong>g>g search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong> lemmatized search<str<strong>on</strong>g>in</str<strong>on</strong>g>g, <strong>and</strong><br />
these features have been <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrated <str<strong>on</strong>g>in</str<strong>on</strong>g>to a read<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> brows<str<strong>on</strong>g>in</str<strong>on</strong>g>g envir<strong>on</strong>ment for the texts.<br />
Some other recent research has focused <strong>on</strong> the use of mach<str<strong>on</strong>g>in</str<strong>on</strong>g>e learn<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> large, unlabeled corpora to<br />
perform automatic morphological analysis <strong>on</strong> classical Greek. Lee (2008) has developed an analyzer of<br />
Ancient Greek that “<str<strong>on</strong>g>in</str<strong>on</strong>g>fers the root form of a word” <strong>and</strong> has made two major <str<strong>on</strong>g>in</str<strong>on</strong>g>novati<strong>on</strong>s over previous<br />
systems:<br />
149 Far earlier but also highly significant work <strong>on</strong> the development of a morphological parser for Ancient Greek was c<strong>on</strong>ducted by David Packard <str<strong>on</strong>g>in</str<strong>on</strong>g> the<br />
1970s (Packard 1973).<br />
150 http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/<br />
151 PhiloLogic (http://www.lib.uchicago.edu/efts/ARTFL/philologic/) is a software tool that has been developed by the Project for American <strong>and</strong> French<br />
Research <strong>on</strong> the Treasury of the French Language at the University of Chicago, <strong>and</strong> <str<strong>on</strong>g>in</str<strong>on</strong>g> “its simplest form serves as a document retrieval or look up<br />
mechanism whereby users can search a relati<strong>on</strong>al database to retrieve given documents <strong>and</strong>, <str<strong>on</strong>g>in</str<strong>on</strong>g> some implementati<strong>on</strong>s, porti<strong>on</strong>s of texts such as acts,<br />
scenes, articles, or head-words.”<br />
152 http://perseus.uchicago.edu/