26.12.2014 Views

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

Rome Wasn't Digitized in a Day - Council on Library and Information ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

166<br />

Another major methodological issue <str<strong>on</strong>g>in</str<strong>on</strong>g> the development of PDBs, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Mathisen, is that they<br />

are about “<str<strong>on</strong>g>in</str<strong>on</strong>g>dividual people,” <strong>and</strong> that these people must have unique identities with<str<strong>on</strong>g>in</str<strong>on</strong>g> a database. Yet<br />

the identificati<strong>on</strong> of <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual people with<str<strong>on</strong>g>in</str<strong>on</strong>g> primary sources is no easy task, <strong>and</strong> even if two sources<br />

cite the pers<strong>on</strong> with the same name it can be difficult to determ<str<strong>on</strong>g>in</str<strong>on</strong>g>e whether it is the same pers<strong>on</strong>.<br />

Additi<strong>on</strong>ally, a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual may go by different names. “Sort<str<strong>on</strong>g>in</str<strong>on</strong>g>g out who’s who,” Mathisen noted,<br />

“either by us<str<strong>on</strong>g>in</str<strong>on</strong>g>g a computer algorithm or by human eye-ball<str<strong>on</strong>g>in</str<strong>on</strong>g>g, c<strong>on</strong>t<str<strong>on</strong>g>in</str<strong>on</strong>g>ues to be <strong>on</strong>e of the major<br />

problems, if not the major problem, fac<str<strong>on</strong>g>in</str<strong>on</strong>g>g the creators of PDBs” (Mathisen 2007). As has been seen<br />

throughout this review, the challenges of historical named-entity disambiguati<strong>on</strong> have also been<br />

highlighted <str<strong>on</strong>g>in</str<strong>on</strong>g> terms of historical place names <str<strong>on</strong>g>in</str<strong>on</strong>g> archaeology (Jeffrey et al. 2009a, Jeffrey et al. 2009b)<br />

<strong>and</strong> classical geography (Elliott <strong>and</strong> Gillies 2009b), <strong>and</strong> both pers<strong>on</strong>al <strong>and</strong> place name disambiguati<strong>on</strong><br />

complicated data <str<strong>on</strong>g>in</str<strong>on</strong>g>tegrati<strong>on</strong> between papyrological <strong>and</strong> epigraphical databases <str<strong>on</strong>g>in</str<strong>on</strong>g> the LaQuAT project<br />

(Jacks<strong>on</strong> et al. 2009).<br />

While hierarchical structures were first explored for PDBs, Mathisen proposed that it was generally<br />

agreed that the relati<strong>on</strong>al model was the best structural model for such databases. Several important<br />

rules for relati<strong>on</strong>al PDBs that Mathisen listed <str<strong>on</strong>g>in</str<strong>on</strong>g>cluded the need to store data <str<strong>on</strong>g>in</str<strong>on</strong>g> tabular format, the<br />

creati<strong>on</strong> of a unique identifier for each primary data record (with<str<strong>on</strong>g>in</str<strong>on</strong>g> PDBs this is typically a pers<strong>on</strong>’s<br />

name comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ed with a number, e.g., Alex<strong>and</strong>er-6), <strong>and</strong> the ability to retrieve data <str<strong>on</strong>g>in</str<strong>on</strong>g> different logical<br />

comb<str<strong>on</strong>g>in</str<strong>on</strong>g>ati<strong>on</strong>s based <strong>on</strong> field values. While many PDBs, Mathisen observed, were often “structured<br />

based <strong>on</strong> a s<str<strong>on</strong>g>in</str<strong>on</strong>g>gle table” that attempted to <str<strong>on</strong>g>in</str<strong>on</strong>g>clude all the important <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> about an <str<strong>on</strong>g>in</str<strong>on</strong>g>dividual,<br />

such a simple structure limited the types of questi<strong>on</strong>s that could be asked of such a database.<br />

The f<str<strong>on</strong>g>in</str<strong>on</strong>g>al major methodological issue Mathisen c<strong>on</strong>sidered was st<strong>and</strong>ardizati<strong>on</strong>. While the early period<br />

of PDB creati<strong>on</strong> saw a number of efforts at develop<str<strong>on</strong>g>in</str<strong>on</strong>g>g a “st<strong>and</strong>ardized format for enter<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>and</strong> stor<str<strong>on</strong>g>in</str<strong>on</strong>g>g<br />

prosopographical material,” Mathisen doubted that any real st<strong>and</strong>ardizati<strong>on</strong> would ever occur. Indeed,<br />

he argued that s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce the “data reducti<strong>on</strong>” methods of any prosopographical database were often<br />

designed based <strong>on</strong> the primary source material at h<strong>and</strong> <strong>and</strong> how it would be used, attempt<str<strong>on</strong>g>in</str<strong>on</strong>g>g to design<br />

an all-purpose method would be <str<strong>on</strong>g>in</str<strong>on</strong>g>efficient. While Mathisen proposed that the use of a relati<strong>on</strong>al<br />

database structure <str<strong>on</strong>g>in</str<strong>on</strong>g> itself should make it relatively easy to transfer data between databases, the<br />

LaQuAT project found this to be far from the case (Jacks<strong>on</strong> et al. 2009).<br />

Despite these various methodological issues, a number of prosopographical database projects have<br />

been created, as seen <str<strong>on</strong>g>in</str<strong>on</strong>g> the next secti<strong>on</strong>s. Mathisen posited that there were two general types of<br />

PDBs: 532 (1) a restricted or limited database that typically <str<strong>on</strong>g>in</str<strong>on</strong>g>corporates <str<strong>on</strong>g>in</str<strong>on</strong>g>dividuals from <strong>on</strong>ly <strong>on</strong>e<br />

“discrete primary or sec<strong>on</strong>dary source”; <strong>and</strong> (2) “<str<strong>on</strong>g>in</str<strong>on</strong>g>clusive” or open-ended databases that usually<br />

<str<strong>on</strong>g>in</str<strong>on</strong>g>clude all of the people who lived at a particular time or place <strong>and</strong> c<strong>on</strong>ta<str<strong>on</strong>g>in</str<strong>on</strong>g> material from many<br />

heterogeneous sources. All the databases c<strong>on</strong>sidered <str<strong>on</strong>g>in</str<strong>on</strong>g> the follow<str<strong>on</strong>g>in</str<strong>on</strong>g>g secti<strong>on</strong>s, with the excepti<strong>on</strong> of<br />

Prosopographia Imperii Romani, are open-ended databases. Such databases are far more difficult to<br />

design, accord<str<strong>on</strong>g>in</str<strong>on</strong>g>g to Mathisen, s<str<strong>on</strong>g>in</str<strong>on</strong>g>ce “designers must anticipate both what k<str<strong>on</strong>g>in</str<strong>on</strong>g>ds of <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> users<br />

might want to access <strong>and</strong> what k<str<strong>on</strong>g>in</str<strong>on</strong>g>ds of <str<strong>on</strong>g>in</str<strong>on</strong>g>formati<strong>on</strong> will be provided by the sources from which the<br />

database will be c<strong>on</strong>structed.” In additi<strong>on</strong>, such databases are typically never completed as new<br />

resources become unearthed or additi<strong>on</strong>al sources are m<str<strong>on</strong>g>in</str<strong>on</strong>g>ed for prosopographical data. “The greatest<br />

future promise of PDBs lies <str<strong>on</strong>g>in</str<strong>on</strong>g> the c<strong>on</strong>structi<strong>on</strong> of more sophisticated <strong>and</strong> comprehensive databases,”<br />

Mathisen c<strong>on</strong>cluded, add<str<strong>on</strong>g>in</str<strong>on</strong>g>g that “<str<strong>on</strong>g>in</str<strong>on</strong>g>clud<str<strong>on</strong>g>in</str<strong>on</strong>g>g a broad range of pers<strong>on</strong>s, c<strong>on</strong>structed from a multiplicity<br />

of sources <strong>and</strong> permitt<str<strong>on</strong>g>in</str<strong>on</strong>g>g search<str<strong>on</strong>g>in</str<strong>on</strong>g>g <strong>on</strong> a multiplicity of fields” (Mathisen 2007).<br />

532 Mathisen lists a third special case of limited databases with the form of open-ended databases, but that “are c<strong>on</strong>structed from exist<str<strong>on</strong>g>in</str<strong>on</strong>g>g hard-copy<br />

prosopographical catalogue” or card-files, <strong>and</strong> the limit is imposed not by source-material but by editorial decisi<strong>on</strong>s <strong>on</strong> whom to <str<strong>on</strong>g>in</str<strong>on</strong>g>clude. In additi<strong>on</strong>,<br />

Mathisen also described a number of “biographical catalogues” like the “De Imperatoribus Romanis” (DIR) (http://www.roman-emperors.org/).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!