Sharing Knowledge: Scientific Communication - SSOAR
Sharing Knowledge: Scientific Communication - SSOAR
Sharing Knowledge: Scientific Communication - SSOAR
Sie wollen auch ein ePaper? Erhöhen Sie die Reichweite Ihrer Titel.
YUMPU macht aus Druck-PDFs automatisch weboptimierte ePaper, die Google liebt.
MPRESS - transition of metadata formats 163<br />
This is done by a modification of harvest’s HTML summarizer. But now under<br />
Glimpse you won’t get this object when searching for DC.subject=19D10.<br />
This does not create any problems in a controlled environment, but when you<br />
start to import SOIF documents from remote sites your metadata become<br />
heterogeneous.<br />
This problem currently occurs in MPRESS with the metadata files we get<br />
from the arXiv.org mirror at Augsburg: The name of the author of a paper is stored<br />
there as DC.creator and not as DC.creator.PersonalName as inside of<br />
MPRESS. This problem of heterogeneity is solved during the import of the<br />
SOIF documents into the broker. The gatherer command that manages this import<br />
pipes the SOIF objects through a Perl script that substitutes the older DC<br />
coding of the author’s name with the newer version.<br />
In analogue to the SCHEME problem the handling of sub qualifiers in DC<br />
coded in RDF is difficult at this stage.<br />
After collecting the documents and generating SOIF objects for them, each<br />
gathering agent exports the SOIF objects via a daemon. Now the broker component<br />
collects the SOIF objects from several daemons and computes an index of<br />
these preprints, or more exactly an index of their SOIF objects. The gatherers<br />
that feed MPRESS are also distributed over the world which makes an upgrading<br />
of software or used formats difficult. The user of MPRESS can only see the<br />
broker which is an interface for searching in this information pool.<br />
Not all stored metadata elements can be used and evaluated by the user. Currently<br />
searches for authors, titles and keywords (and dates in France) are supported,<br />
but if you know the correct syntax of the other metadata items, you can also<br />
use the full text search field to query for the other items.<br />
An important feature of MPRESS is the browse function besides the search<br />
functionality. Here the user gets a table for the several levels of the MSC and the<br />
number of preprints classified in the respective categories. Then he can browse<br />
through the next level of the MSC or search in the respective subset of preprints.<br />
To achieve such a functionality of the broker, one has to refine the perl scripts<br />
of the harvest distribution which organize the preparation of the query to the<br />
glimpse database and which arrange the answers from the glimpse database to<br />
be nicely displayed. In detail this is the BrokerQuery.pl script in harvest1.4 or<br />
the nph-search script in harvest1.5.<br />
Especially the browse functionality is not part of the harvest software. The<br />
pages that appear during the browsing process are generated pseudo-automatically:<br />
The pages reside in the file system of the WWW server, but they are generated<br />
once a day to be up to date.<br />
This is done by a cronjob, which is a perl script that queries the broker for the<br />
different MSC categories. Analogously the MSC specific search interfaces are<br />
generated on the fly to send an MSC enriched query to the broker that filters out<br />
the papers without the correct MSC codes. That means the browsing function is