10.05.2014 Views

Metadata standards at the Library of Congress

Metadata standards at the Library of Congress

Metadata standards at the Library of Congress

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Metad<strong>at</strong>a</strong> <strong>standards</strong> <strong>at</strong> <strong>the</strong><br />

<strong>Library</strong> <strong>of</strong> <strong>Congress</strong><br />

Rebecca Guen<strong>the</strong>r<br />

AMPAS Digital Motion Picture<br />

<strong>Metad<strong>at</strong>a</strong> Symposium<br />

June 11, 2009


Overview <strong>of</strong> present<strong>at</strong>ion<br />

Functions <strong>of</strong> metad<strong>at</strong>a<br />

Types <strong>of</strong> metad<strong>at</strong>a<br />

Descriptive<br />

Administr<strong>at</strong>ive (includes technical)<br />

Preserv<strong>at</strong>ion<br />

Structural<br />

<strong>Metad<strong>at</strong>a</strong> <strong>standards</strong> maintained <strong>at</strong> LC<br />

Implementing metad<strong>at</strong>a


Functions <strong>of</strong> metad<strong>at</strong>a<br />

Discovery<br />

Management<br />

Control IP rights<br />

Identific<strong>at</strong>ion<br />

Certify au<strong>the</strong>nticity<br />

Mark content structure<br />

Indic<strong>at</strong>e st<strong>at</strong>us<br />

Describe processes


<strong>Metad<strong>at</strong>a</strong> <strong>standards</strong> for digital<br />

m<strong>at</strong>erials<br />

<br />

<br />

<br />

Digital objects are complex and comprised <strong>of</strong> multiple files, e.g.<br />

scanned digital books<br />

http://lcweb2.loc.gov/diglib/ihas/loc.n<strong>at</strong>lib.ihas.200033350/<br />

contactsheet.html<br />

Using established <strong>standards</strong> facilit<strong>at</strong>es exchange<br />

Complex objects require more metad<strong>at</strong>a than analog for <strong>the</strong>ir<br />

management and use<br />

• Descriptive<br />

• Administr<strong>at</strong>ive<br />

• Technical<br />

• Digital provenance/events<br />

• Rights/Terms and conditions<br />

• Structural<br />

• There is need to integr<strong>at</strong>e metad<strong>at</strong>a for physical and digital<br />

objects


Types <strong>of</strong> metad<strong>at</strong>a<br />

Descriptive<br />

Administr<strong>at</strong>ive<br />

Technical<br />

Digital provenance<br />

Rights/Access<br />

Preserv<strong>at</strong>ion (supported by <strong>the</strong> above)<br />

Structural<br />

Meta-metad<strong>at</strong>a


Descriptive <strong>Metad<strong>at</strong>a</strong><br />

Title, author, human-readable<br />

description <strong>of</strong> a resource<br />

Subject or topical inform<strong>at</strong>ion<br />

Genre and form<strong>at</strong> <strong>of</strong> <strong>the</strong> resource<br />

Rel<strong>at</strong>ionships with o<strong>the</strong>r resources<br />

(version, parent/child, etc.)


More about descriptive metad<strong>at</strong>a<br />

Most standardized and well understood type <strong>of</strong><br />

metad<strong>at</strong>a<br />

Increased number <strong>of</strong> descriptive metad<strong>at</strong>a<br />

<strong>standards</strong> for different needs and communities<br />

Important for resource discovery<br />

May support various user tasks<br />

Has different aspects: content rules, controlled<br />

values, form<strong>at</strong>s/schemas, syntax<br />

May be <strong>at</strong> various levels <strong>of</strong> granularity (e.g. <strong>the</strong><br />

book, <strong>the</strong> article, <strong>the</strong> page, <strong>the</strong> physical item)


The current descriptive<br />

metad<strong>at</strong>a environment<br />

Multiplicity <strong>of</strong> descriptive metad<strong>at</strong>a <strong>standards</strong><br />

Cultural heritage institutions have a long<br />

tradition <strong>of</strong> exchanging metad<strong>at</strong>a in a<br />

standardized form<br />

MARC 21 provides an exchange standard<br />

Years <strong>of</strong> record exchange has resulted in cost savings<br />

Using a well-defined and understood standard allows<br />

for future migr<strong>at</strong>ion to ano<strong>the</strong>r system<br />

Supporting <strong>standards</strong> can be leveraged in <strong>the</strong> broader<br />

metad<strong>at</strong>a community<br />

Many moving images have existing descriptive<br />

metad<strong>at</strong>a in library c<strong>at</strong>alogs and systems th<strong>at</strong> can be<br />

reused


Descriptive metad<strong>at</strong>a <strong>standards</strong><br />

maintained <strong>at</strong> LC<br />

Descriptive form<strong>at</strong>s<br />

MARC 21<br />

MODS<br />

EAD<br />

Controlled vocabulary <strong>standards</strong> <strong>at</strong> LC<br />

<strong>Library</strong> <strong>of</strong> <strong>Congress</strong> Subject Headings<br />

Thesaurus <strong>of</strong> Graphic M<strong>at</strong>erials<br />

ISO 639 Language codes<br />

MARC Country and Geographic Area codes<br />

MARC Rel<strong>at</strong>or codes<br />

<strong>Library</strong> <strong>of</strong> <strong>Congress</strong> Classific<strong>at</strong>ion


Wh<strong>at</strong> is MARC 21?<br />

A syntax defined by an intern<strong>at</strong>ional standard<br />

and was developed in <strong>the</strong> l<strong>at</strong>e 60s<br />

As a syntax it has 2 expressions:<br />

Classic MARC (MARC 2709)<br />

MARCXML<br />

A d<strong>at</strong>a element set defined by content<br />

design<strong>at</strong>ion and semantics<br />

Institutions do not store “MARC 21”, as it is a<br />

communic<strong>at</strong>ions form<strong>at</strong><br />

Many d<strong>at</strong>a elements are defined by external<br />

content rules<br />

Billions <strong>of</strong> bibliographic records world-wide


Wh<strong>at</strong> is MODS?<br />

<strong>Metad<strong>at</strong>a</strong> Object Description Schema<br />

An XML descriptive metad<strong>at</strong>a standard<br />

A deriv<strong>at</strong>ive <strong>of</strong> MARC<br />

Uses language based tags<br />

Contains a subset <strong>of</strong> MARC d<strong>at</strong>a elements<br />

Rich, but not as rich as full MARC<br />

More comp<strong>at</strong>ible with existing library d<strong>at</strong>a than DC<br />

Does not assume <strong>the</strong> use <strong>of</strong> any specific rules for<br />

description<br />

Element set is particularly applicable to digital<br />

resources


Administr<strong>at</strong>ive <strong>Metad<strong>at</strong>a</strong><br />

<strong>Metad<strong>at</strong>a</strong> to manage <strong>the</strong> object<br />

Technical metad<strong>at</strong>a: technical characteristics<br />

about <strong>the</strong> object<br />

Digital provenance metad<strong>at</strong>a: actions th<strong>at</strong><br />

have been performed on <strong>the</strong> object<br />

Rights metad<strong>at</strong>a: inform<strong>at</strong>ion about access<br />

and use <strong>of</strong> <strong>the</strong> object<br />

Often <strong>at</strong> a lower level <strong>of</strong> granularity than<br />

descriptive metad<strong>at</strong>a


Rights/Access <strong>Metad<strong>at</strong>a</strong><br />

Where is <strong>the</strong> resource? Is it in a place<br />

open to me?<br />

Are <strong>the</strong>re restrictions on <strong>the</strong> use <strong>of</strong> <strong>the</strong><br />

resource?<br />

Wh<strong>at</strong> can I do with this resource?<br />

May include machine-actionable d<strong>at</strong>a<br />

used by DRM systems


Preserv<strong>at</strong>ion <strong>Metad<strong>at</strong>a</strong><br />

Designed to ensure th<strong>at</strong> <strong>the</strong> inform<strong>at</strong>ion <strong>the</strong> resource<br />

contains remains accessible to users over a long period<br />

<strong>of</strong> time<br />

Oriented toward finished products, but also applicable to<br />

works in process<br />

Records details about form<strong>at</strong> migr<strong>at</strong>ion and d<strong>at</strong>a<br />

refreshment<br />

Allows a variety <strong>of</strong> approaches to <strong>the</strong> problem <strong>of</strong><br />

maintaining resources over time<br />

Falls into administr<strong>at</strong>ive metad<strong>at</strong>a<br />

Much is extracted from object and could be carried along<br />

during <strong>the</strong> production process or machine gener<strong>at</strong>ed


Preserv<strong>at</strong>ion metad<strong>at</strong>a includes:<br />

<br />

Provenance:<br />

Who has had custody/ownership <strong>of</strong> <strong>the</strong><br />

digital object?<br />

Content<br />

Preserv<strong>at</strong>ion<br />

<strong>Metad<strong>at</strong>a</strong><br />

<br />

<br />

Au<strong>the</strong>nticity:<br />

Is <strong>the</strong> digital object wh<strong>at</strong> it purports to be?<br />

Preserv<strong>at</strong>ion Activity:<br />

Wh<strong>at</strong> has been done to preserve it?<br />

10 years on<br />

50 years on<br />

Forever!<br />

<br />

Technical Environment:<br />

Wh<strong>at</strong> is needed to render and use it?<br />

<br />

Rights Management:<br />

Wh<strong>at</strong> IPR must be observed?<br />

Makes digital objects self-documenting across time


PREMIS D<strong>at</strong>a Dictionary for<br />

Preserv<strong>at</strong>ion <strong>Metad<strong>at</strong>a</strong><br />

A d<strong>at</strong>a dictionary for metad<strong>at</strong>a to support <strong>the</strong><br />

long-term preserv<strong>at</strong>ion <strong>of</strong> digital objects<br />

A piece <strong>of</strong> <strong>the</strong> necessary infrastructure for<br />

implementing reliable, sustainable<br />

preserv<strong>at</strong>ion programs<br />

A supporting XML schema for implement<strong>at</strong>ion<br />

in a variety <strong>of</strong> contexts<br />

A maintenance activity hosted <strong>at</strong> LC including<br />

an Implementers’ Group and Editorial<br />

Committee


Technical metad<strong>at</strong>a in PREMIS<br />

Object ID<br />

Preserv<strong>at</strong>ion level<br />

Object characteristics (form<strong>at</strong>, size,<br />

cre<strong>at</strong>ing applic<strong>at</strong>ion etc.)<br />

Storage<br />

Environment (hardware and s<strong>of</strong>tware)<br />

Digital sign<strong>at</strong>ures<br />

Rel<strong>at</strong>ionships<br />

Linking identifiers


Preserv<strong>at</strong>ion actions<br />

PREMIS event inform<strong>at</strong>ion<br />

Event ID<br />

Event type<br />

Event d<strong>at</strong>e/time<br />

Event outcomes<br />

Linking identifiers<br />

Need to document actions on objects for<br />

long term preserv<strong>at</strong>ion regardless <strong>of</strong><br />

preserv<strong>at</strong>ion str<strong>at</strong>egy


Technical metad<strong>at</strong>a for images<br />

<strong>Metad<strong>at</strong>a</strong> for Images in XML (MIX)<br />

An XML Schema designed for expressing technical<br />

metad<strong>at</strong>a for digital still images<br />

Based on <strong>the</strong> NISO Z39.87 D<strong>at</strong>a Dictionary –<br />

Technical <strong>Metad<strong>at</strong>a</strong> for Digital Still Images<br />

Form<strong>at</strong> specific metad<strong>at</strong>a for images, e.g. bit<br />

depth, color space, camera settings, etc<br />

Most well developed <strong>of</strong> form<strong>at</strong> specific technical<br />

metad<strong>at</strong>a <strong>standards</strong><br />

LC is maintenance agency


Technical metad<strong>at</strong>a for<br />

audio and video<br />

Not as well developed as o<strong>the</strong>r technical<br />

metad<strong>at</strong>a<br />

Complexities <strong>of</strong> file form<strong>at</strong>s requires expertise to<br />

develop <strong>the</strong>se<br />

LC developed XML technical metad<strong>at</strong>a schemas<br />

in 2003/2004 for LC Audiovisual Prototype<br />

Project; widely implemented because <strong>of</strong> <strong>the</strong> lack<br />

<strong>of</strong> o<strong>the</strong>r schemas<br />

Audio and video technical metad<strong>at</strong>a schemas<br />

under development by expert organiz<strong>at</strong>ions<br />

Moving Image Collections (MIC) project is also<br />

experimenting with <strong>the</strong>se


Audio/video object and<br />

provenance metad<strong>at</strong>a<br />

Audio object metad<strong>at</strong>a: AES X098B<br />

(Audio Object Schema) and AES<br />

X098C (Process History Schema)<br />

Many definitions could come from<br />

SMPTE-RP-210 registry <strong>of</strong> terms<br />

Digital provenance metad<strong>at</strong>a, similar<br />

to PREMIS event<br />

MIC is adapting AES X098B for video


Structural <strong>Metad<strong>at</strong>a</strong><br />

Ties <strong>the</strong> components <strong>of</strong> a complex or<br />

compound resource toge<strong>the</strong>r and makes <strong>the</strong><br />

whole usable<br />

Enables flexible and local approaches to<br />

present<strong>at</strong>ion and navig<strong>at</strong>ion<br />

Various approaches to sharing structural<br />

metad<strong>at</strong>a exist<br />

A number <strong>of</strong> container <strong>standards</strong> have been<br />

developed by different communities


<strong>Metad<strong>at</strong>a</strong> Encoding & Transmission<br />

Standard (METS)<br />

Developed by <strong>the</strong> Digital <strong>Library</strong> Feder<strong>at</strong>ion,<br />

maintained by <strong>the</strong> <strong>Library</strong> <strong>of</strong> <strong>Congress</strong><br />

Records <strong>the</strong> (possibly hierarchical) structure <strong>of</strong> digital<br />

objects, <strong>the</strong> names and loc<strong>at</strong>ions <strong>of</strong> <strong>the</strong> files th<strong>at</strong><br />

comprise those objects, and <strong>the</strong> associ<strong>at</strong>ed metad<strong>at</strong>a<br />

To package metad<strong>at</strong>a with digital object (or link to<br />

digital object) in XML syntax<br />

For retrieving, storing, preserving, serving resource<br />

For interchange <strong>of</strong> digital objects with metad<strong>at</strong>a<br />

As inform<strong>at</strong>ion package in a digital repository (may be<br />

a unit <strong>of</strong> storage or a transmission form<strong>at</strong>)<br />

http://www.loc.gov/<strong>standards</strong>/mets/<br />

23


The structure <strong>of</strong> a METS file<br />

METS<br />

fileSec<br />

dmdSec<br />

amdSec<br />

behaviorSec<br />

structMap<br />

file inventory<br />

descriptive metad<strong>at</strong>a<br />

administr<strong>at</strong>ive metad<strong>at</strong>a<br />

behaviour metad<strong>at</strong>a<br />

structural map


O<strong>the</strong>r container form<strong>at</strong>s<br />

Support transmission and archiving<br />

Bundle toge<strong>the</strong>r metad<strong>at</strong>a and content<br />

Some o<strong>the</strong>r examples:<br />

M<strong>at</strong>erial eXchange Form<strong>at</strong> (MXF)<br />

MPEG 21 DIDL<br />

AES 31-3 Standard for Network and File<br />

Transport <strong>of</strong> Audio - Audio-File Transfer and<br />

Exchange


Meta-metad<strong>at</strong>a<br />

<strong>Metad<strong>at</strong>a</strong> about <strong>the</strong> metad<strong>at</strong>a<br />

Who cre<strong>at</strong>ed this inform<strong>at</strong>ion?<br />

When was it cre<strong>at</strong>ed?<br />

When were links last checked?<br />

O<strong>the</strong>r upd<strong>at</strong>e transactions?<br />

May be a component <strong>of</strong> some metad<strong>at</strong>a schemes<br />

Allows for managing <strong>the</strong> metad<strong>at</strong>a, not just <strong>the</strong><br />

resource described


Implementing metad<strong>at</strong>a<br />

Wh<strong>at</strong> metad<strong>at</strong>a can be carried along with <strong>the</strong><br />

object over its lifecycle th<strong>at</strong> can be used postproduction?<br />

Descriptive<br />

Technical<br />

Process history<br />

Rights<br />

<strong>Metad<strong>at</strong>a</strong> extraction<br />

Much technical metad<strong>at</strong>a may be extracted from file<br />

headers<br />

Some metad<strong>at</strong>a may be gener<strong>at</strong>ed on a b<strong>at</strong>ch <strong>of</strong><br />

objects


Example <strong>of</strong> metad<strong>at</strong>a for LC video object<br />

http://lcweb2.loc.gov/diglib/ihas/loc.n<strong>at</strong>lib.ihas.<br />

200031108/default.html


Descriptive metad<strong>at</strong>a<br />

Title<br />

Gre<strong>at</strong> convers<strong>at</strong>ions: <strong>the</strong> conductors: Zubin Mehta<br />

Interviewee(s)<br />

Mehta, Zubin<br />

Producer(s)<br />

Rosen, Peter<br />

Editor(s)<br />

Warshaw, Hilan<br />

Host<br />

Istomin, Eugene<br />

Director<br />

Rosen, Peter<br />

Place <strong>of</strong> Cre<strong>at</strong>ion<br />

New York<br />

Publisher(s)<br />

Peter Rosen Productions, Inc.<br />

D<strong>at</strong>e issued 2005<br />

Form<br />

videorecording<br />

Physical Description 1 digibeta videotape; dur<strong>at</strong>ion: 65 min., 10 sec.<br />

Permissions note<br />

Copyright <strong>Library</strong> <strong>of</strong> <strong>Congress</strong>. This program was made<br />

possible through <strong>the</strong> courtesy <strong>of</strong> Eugene Istomin. Zubin Mehta's appearance is<br />

courtesy <strong>of</strong> himself.<br />

Type <strong>of</strong> M<strong>at</strong>erial<br />

moving image<br />

Contents note<br />

In a series <strong>of</strong> three one-on-one discussions, <strong>the</strong> Russianborn<br />

Mstislav Rostropovich, Indian-born Zubin Mehta, and James Conlon, a n<strong>at</strong>ive <strong>of</strong><br />

New York City, converse about leadership and inspir<strong>at</strong>ion on <strong>the</strong> podium, <strong>the</strong>ir views<br />

about <strong>the</strong> influence <strong>of</strong> European classical music on American music, and <strong>the</strong><br />

influence <strong>of</strong> American popular music on o<strong>the</strong>r cultures.<br />

Cre<strong>at</strong>or note(s)<br />

In opening credits: "<strong>Library</strong> <strong>of</strong> <strong>Congress</strong>, Washington,<br />

D.C., presents."<br />

Bibliographic history note <strong>Library</strong> <strong>of</strong> <strong>Congress</strong> extended version.<br />

Additional credits: for <strong>the</strong> Music Division, <strong>Library</strong> <strong>of</strong><br />

<strong>Congress</strong>: Jon Newsom, chief; Jan Lauridsen, assist. Chief; Ruth Foss, program<br />

specialist.<br />

Repository<br />

Music Division


Sample PREMIS metad<strong>at</strong>a for one video file<br />

Object metad<strong>at</strong>a<br />

Object identifier<br />

hdl.loc.gov/gre<strong>at</strong>conv/200031108/seg01/video/0001.mpg<br />

Object characteristics<br />

Message digest algorithm md5<br />

Message digest<br />

ceb3dbc5dacd3883d0985174ef5df7db<br />

Size 310800388<br />

Form<strong>at</strong> name<br />

video/mpeg<br />

Form<strong>at</strong> Version<br />

Cre<strong>at</strong>ing applic<strong>at</strong>ion<br />

VideoLAN<br />

Cre<strong>at</strong>ing applic<strong>at</strong>ion<br />

ffmpeg<br />

Rel<strong>at</strong>ionship (structural)<br />

(identifiers <strong>of</strong> video objects for o<strong>the</strong>r segments)<br />

Event metad<strong>at</strong>a<br />

Event identifier E001.1<br />

Event type<br />

valid<strong>at</strong>ion<br />

Event d<strong>at</strong>e/time<br />

2006-06-06T00:00:00.005<br />

Event outcome<br />

successful; well-formed and valid<br />

Linking agent identifier<br />

(identifier <strong>of</strong> s<strong>of</strong>tware program th<strong>at</strong> valid<strong>at</strong>ed)


Technical metad<strong>at</strong>a for video<br />

<br />

<br />

24<br />

1<br />

<br />

Micros<strong>of</strong>t<br />

<br />

MPEG-4 (fast motion) <br />

lossy <br />

<br />

Yes<br />

Service<br />

<br />

<br />

4x3<br />

<br />

8 min 37 sec 647 ms<br />

<br />

704<br />

480<br />

30<br />

<br />

is seekable<br />

<br />


Technical metad<strong>at</strong>a for audio<br />

<br />

<br />

8<br />

1<br />

<br />

MPEG 2<br />

lossy<br />

<br />

112 kb/s<br />

MPEG 2<br />

44100<br />

<br />

<br />

8 min 37 sec 647 ms<br />

Is seekable<br />

1<br />

<br />


Closing thoughts<br />

Different kinds <strong>of</strong> metad<strong>at</strong>a are needed for use/<br />

present<strong>at</strong>ion and preserv<strong>at</strong>ion <strong>of</strong> digital moving<br />

image resources<br />

Wh<strong>at</strong>ever can be carried along with <strong>the</strong> resource<br />

in various stages <strong>of</strong> its lifecycle should be saved<br />

for <strong>the</strong> future<br />

Collabor<strong>at</strong>ion between those responsible for<br />

producing and those caring for and making<br />

available <strong>the</strong>se resources benefits everyone<br />

Preserv<strong>at</strong>ion is a shared problem th<strong>at</strong> requires a<br />

shared solution<br />

Standards development needs to be cooper<strong>at</strong>ive

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!