28.06.2013 Views

Papers in PDF format

Papers in PDF format

Papers in PDF format

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Introduction<br />

Provid<strong>in</strong>g Enterprise Wide and Localized Search<strong>in</strong>g at a Large,<br />

Decentralized Institution<br />

David J. Wang, djw@gdb.org<br />

George Ciervo, ciervo@gdb.org<br />

Division of Biomedical In<strong>format</strong>ion Sciences<br />

Johns Hopk<strong>in</strong>s University School of Medic<strong>in</strong>e,<br />

Baltimore, MD 21205 USA<br />

http://<strong>in</strong>fonet.welch.jhu.edu/miss_smt.html<br />

Like many research <strong>in</strong>stitutions, the Johns Hopk<strong>in</strong>s University is highly decentralized. This lack of<br />

centralization pervades the culture of Johns Hopk<strong>in</strong>s and is evident <strong>in</strong> nearly every facet of the university's<br />

bus<strong>in</strong>ess. Not surpris<strong>in</strong>gly, many of the university's WWW servers also lack a central po<strong>in</strong>t of management that<br />

facilitates effective access. As a result, it is nearly impossible to build a hierarchical navigation structure with<br />

l<strong>in</strong>ks po<strong>in</strong>t<strong>in</strong>g to all the "most appropriate" locations. Even if such a structure were possible, the cont<strong>in</strong>ued<br />

development of pages by divisions, departments, labs, offices, and other groups would render any such<br />

navigation arcane. In this environment, an <strong>in</strong>stitutional site wide search is imperative.<br />

Furthermore, though many of these sites are small, some are reasonably large <strong>in</strong> scale. The <strong>in</strong>clusion of<br />

search<strong>in</strong>g functions <strong>in</strong> these <strong>in</strong>dividual larger sites would provide an additional valid and often necessary<br />

navigation function. Unfortunately, many of these sites lack staff with the appropriate technical skills to<br />

implement such a search. While design<strong>in</strong>g the Hopk<strong>in</strong>s-Wide Search, it was decided that provid<strong>in</strong>g a<br />

mechanism to easily allow Hopk<strong>in</strong>s’ web adm<strong>in</strong>istrators to make their site searchable would add substantial<br />

functionality.<br />

Provid<strong>in</strong>g a Uniform Search Interface for All Web Documents<br />

After exam<strong>in</strong><strong>in</strong>g several possible search<strong>in</strong>g solutions, Verity's Topic Internet Server was selected and <strong>in</strong>stalled.<br />

The server consists of a search eng<strong>in</strong>e, a remote <strong>in</strong>dexer, and an end-user <strong>in</strong>terface. The remote <strong>in</strong>dexer is used<br />

to <strong>in</strong>dex multiple Johns Hopk<strong>in</strong>s web sites. Once a site has been <strong>in</strong>dexed, it is ma<strong>in</strong>ta<strong>in</strong>ed as a separate<br />

"collection", allow<strong>in</strong>g a user to search one or multiple collections. A form is provided [URL1] where<br />

webmasters can both register their site and obta<strong>in</strong> html-code for <strong>in</strong>clusion <strong>in</strong> their html pages. This request is<br />

forwarded to the InfoNet Development Group, and the site is <strong>in</strong>cluded <strong>in</strong> the <strong>in</strong>dex with<strong>in</strong> twenty-four hours.<br />

Once <strong>in</strong>dexed, the site is immediately searchable. The code provided to the webmaster is a form that makes<br />

search<strong>in</strong>g that site possible. All the queries are aga<strong>in</strong>st the <strong>in</strong>dex ma<strong>in</strong>ta<strong>in</strong>ed by InfoNet. Thus, each site may<br />

be queried as part of the master <strong>in</strong>dex [URL2], which <strong>in</strong>cludes ALL the <strong>in</strong>dexed sites, or as an <strong>in</strong>dividual<br />

collection [URL3]. Once the search has been executed, users are presented with a list of the top 25 results.<br />

They have the option of pag<strong>in</strong>g forward or backward through their results. Users also have the ability to<br />

improve their search by requir<strong>in</strong>g that additional words be present, by weight<strong>in</strong>g certa<strong>in</strong> words, or by requir<strong>in</strong>g<br />

the abscence of certa<strong>in</strong> words. F<strong>in</strong>ally, users search<strong>in</strong>g <strong>in</strong> <strong>in</strong>dividual sites have the option to cont<strong>in</strong>ue their<br />

localized site search or to expand their search to all the <strong>in</strong>dexed documents.<br />

Expand<strong>in</strong>g the Hopk<strong>in</strong>s Wide Search<br />

In review<strong>in</strong>g the server logs after the search had been put onl<strong>in</strong>e, it was discovered that a number of queries<br />

were directory-type requests, i.e., users look<strong>in</strong>g for personnel and contact <strong>in</strong><strong>format</strong>ion, and hence the need for a<br />

"one-stop-search" was recognized. S<strong>in</strong>ce there is no s<strong>in</strong>gle and def<strong>in</strong>itive onl<strong>in</strong>e Hopk<strong>in</strong>s directory, it was vital<br />

that this search not be limited to web documents, but also <strong>in</strong>clude other web-accessible <strong>in</strong><strong>format</strong>ion. As a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!