28.02.2013 Aufrufe

Sharing Knowledge: Scientific Communication - SSOAR

Sharing Knowledge: Scientific Communication - SSOAR

Sharing Knowledge: Scientific Communication - SSOAR

MEHR ANZEIGEN
WENIGER ANZEIGEN

Sie wollen auch ein ePaper? Erhöhen Sie die Reichweite Ihrer Titel.

YUMPU macht aus Druck-PDFs automatisch weboptimierte ePaper, die Google liebt.

MPRESS - transition of metadata formats 163<br />

This is done by a modification of harvest’s HTML summarizer. But now under<br />

Glimpse you won’t get this object when searching for DC.subject=19D10.<br />

This does not create any problems in a controlled environment, but when you<br />

start to import SOIF documents from remote sites your metadata become<br />

heterogeneous.<br />

This problem currently occurs in MPRESS with the metadata files we get<br />

from the arXiv.org mirror at Augsburg: The name of the author of a paper is stored<br />

there as DC.creator and not as DC.creator.PersonalName as inside of<br />

MPRESS. This problem of heterogeneity is solved during the import of the<br />

SOIF documents into the broker. The gatherer command that manages this import<br />

pipes the SOIF objects through a Perl script that substitutes the older DC<br />

coding of the author’s name with the newer version.<br />

In analogue to the SCHEME problem the handling of sub qualifiers in DC<br />

coded in RDF is difficult at this stage.<br />

After collecting the documents and generating SOIF objects for them, each<br />

gathering agent exports the SOIF objects via a daemon. Now the broker component<br />

collects the SOIF objects from several daemons and computes an index of<br />

these preprints, or more exactly an index of their SOIF objects. The gatherers<br />

that feed MPRESS are also distributed over the world which makes an upgrading<br />

of software or used formats difficult. The user of MPRESS can only see the<br />

broker which is an interface for searching in this information pool.<br />

Not all stored metadata elements can be used and evaluated by the user. Currently<br />

searches for authors, titles and keywords (and dates in France) are supported,<br />

but if you know the correct syntax of the other metadata items, you can also<br />

use the full text search field to query for the other items.<br />

An important feature of MPRESS is the browse function besides the search<br />

functionality. Here the user gets a table for the several levels of the MSC and the<br />

number of preprints classified in the respective categories. Then he can browse<br />

through the next level of the MSC or search in the respective subset of preprints.<br />

To achieve such a functionality of the broker, one has to refine the perl scripts<br />

of the harvest distribution which organize the preparation of the query to the<br />

glimpse database and which arrange the answers from the glimpse database to<br />

be nicely displayed. In detail this is the BrokerQuery.pl script in harvest1.4 or<br />

the nph-search script in harvest1.5.<br />

Especially the browse functionality is not part of the harvest software. The<br />

pages that appear during the browsing process are generated pseudo-automatically:<br />

The pages reside in the file system of the WWW server, but they are generated<br />

once a day to be up to date.<br />

This is done by a cronjob, which is a perl script that queries the broker for the<br />

different MSC categories. Analogously the MSC specific search interfaces are<br />

generated on the fly to send an MSC enriched query to the broker that filters out<br />

the papers without the correct MSC codes. That means the browsing function is

Hurra! Ihre Datei wurde hochgeladen und ist bereit für die Veröffentlichung.

Erfolgreich gespeichert!

Leider ist etwas schief gelaufen!