29.04.2014 Views

Tools for mass digitization and long-term perservation in cultural ...

Tools for mass digitization and long-term perservation in cultural ...

Tools for mass digitization and long-term perservation in cultural ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Tools</strong> <strong>for</strong> <strong>mass</strong> digitisation <strong>and</strong> <strong>long</strong>-<strong>term</strong><br />

preservation <strong>in</strong> <strong>cultural</strong> heritage <strong>in</strong>stitutions<br />

Cezary Mazurek, Tomasz Parkoła, Marc<strong>in</strong> Werla<br />

SEEDI conference 17-18 of May 2012


Agenda<br />

• Introduction<br />

• Mass digitisation<br />

• Long-<strong>term</strong> preservation<br />

• <strong>Tools</strong> support<strong>in</strong>g digitisation activities<br />

• dLibra <strong>and</strong> dMuseion – digital library/repository/museum<br />

• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />

• dLab – digitisation workflow management tool<br />

• Summary<br />

2


Introduction<br />

Mass digitisation is a large-scale automated process<br />

of captur<strong>in</strong>g the analog signal <strong>in</strong>to digital <strong>for</strong>m,<br />

<strong>in</strong>clud<strong>in</strong>g enhancements such as OCR <strong>and</strong><br />

transcription.<br />

Long-<strong>term</strong> preservation assures that the digital<br />

<strong>in</strong><strong>for</strong>mation is accesible today, tomorrow, <strong>in</strong> a year, 10<br />

years, etc.<br />

3


Support<strong>in</strong>g tools<br />

Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />

Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />

• dLibra – digital library framework<br />

• dMuseion – digital museum framwork<br />

• dArceo – <strong>long</strong>-<strong>term</strong> preservation services<br />

• dLab – digitisation workflow management<br />

4


dLibra<br />

Developed by Poznań Supercomput<strong>in</strong>g <strong>and</strong><br />

Network<strong>in</strong>g Center (Polish Academy of Sciences)<br />

• Has been developed by PSNC s<strong>in</strong>ce 1999<br />

• The first Polish software <strong>for</strong> build<strong>in</strong>g digital libraries<br />

• Key element <strong>in</strong> stimulat<strong>in</strong>g the growth of digital<br />

libraries <strong>in</strong> Pol<strong>and</strong><br />

5


dLibra – deployments<br />

60<br />

50<br />

Number of deployments<br />

40<br />

30<br />

20<br />

10<br />

0<br />

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011<br />

Institutional deployments 0 0 1 1 5 7 13 22 26 28<br />

Regional deployments 1 1 1 5 10 11 17 25 28 32


dLibra<br />

Polish Digital Libraries<br />

Approx. no of digital objects:<br />

1 000 000<br />

Number of digital libraries:<br />

Over 80<br />

Participat<strong>in</strong>g <strong>in</strong>stitutions:<br />

Several hundreds<br />

All resources available via the Polish<br />

national aggregator – Digital Libraries<br />

Federation:<br />

http://fbc.pionier.net.pl<br />

Regional digital libraries<br />

Institutional digital libraries<br />

7


dLibra – regional digital library<br />

8


dLibra – <strong>in</strong>stitutional digital library<br />

9


dMuseion<br />

• Dedicated software tool <strong>for</strong> digital<br />

museums<br />

– Has been developed <strong>in</strong> cooperation with National Museum <strong>in</strong><br />

Warsaw<br />

– The aim is to make the museum resources available <strong>in</strong> the Internet<br />

<strong>and</strong> prepare an easy to use software package <strong>for</strong> build<strong>in</strong>g digital<br />

museums<br />

• Why a dedicated solution?<br />

– Different type of resources than <strong>in</strong> libraries/respositories <strong>and</strong><br />

archives (pa<strong>in</strong>t<strong>in</strong>gs, sculptures, etc.)<br />

– Themed collections of museum resources<br />

– Term<strong>in</strong>ology<br />

10


dMuseion – ma<strong>in</strong> page<br />

11


dMuseion – digital object metadata<br />

12


dMuseion – 3D object<br />

13


dArceo<br />

• Has been developed by PSNC s<strong>in</strong>ce 2011<br />

• It is based on the prototype services developed <strong>in</strong><br />

frame of the SYNAT project, f<strong>in</strong>anced by the Polish<br />

National Center <strong>for</strong> Research <strong>and</strong> Development<br />

• Dedicated to preserve master, optimized <strong>and</strong> even<br />

presentation files with primary focus on:<br />

– Textual data (e.g. PDF/A)<br />

– Images (e.g. TIFF, JPEG2000)<br />

– Audiovisual (e.g. MPEG-4)<br />

14


dArceo<br />

Basic functions <strong>and</strong> characteristics (1)<br />

• Can utilise various types of data storage<br />

– Local hard drive (RAID is suggested), disk array, etc.<br />

– (S)FTP, e.g. archiv<strong>in</strong>g services of PLATON U4 (R&D project)<br />

• Internal representation uses well-known st<strong>and</strong>ards<br />

<strong>and</strong> <strong>for</strong>mats (e.g. METS, PREMIS)<br />

• Dedicated build-<strong>in</strong> mechanism <strong>for</strong> data monitor<strong>in</strong>g<br />

– loss risk calculation based on PRONOM/UDFR<br />

15


dArceo<br />

Basic functions <strong>and</strong> characteristics (2)<br />

• The most important functionality<br />

– Data migration <strong>for</strong> the needs of <strong>long</strong>-<strong>term</strong> preservation (OAIS<br />

trans<strong>for</strong>mation approach)<br />

– Data conversion, e.g. <strong>for</strong> the needs of digital libraries <strong>and</strong><br />

onl<strong>in</strong>e availability of resources<br />

– Advanced data delivery, e.g. dedicated tool <strong>for</strong> large-size data<br />

visualisation, transcription tool<br />

• It is possible to def<strong>in</strong>e migration/conversion plans<br />

– Migration or conversion can have several steps (pipel<strong>in</strong><strong>in</strong>g of<br />

the services)<br />

– Semantic technologies applied <strong>for</strong> the orchestration of the<br />

data manipulation services<br />

16


dArceo<br />

Basic functions <strong>and</strong> characteristics (3)<br />

• Capability of shar<strong>in</strong>g migration, conversion <strong>and</strong><br />

advanced delivery services<br />

– By means of synchronisation <strong>and</strong> P2P-like communication<br />

• Data safety <strong>and</strong> security<br />

– AAA of the users <strong>and</strong> external services<br />

– Data safety should be assured by the data storage (e.g.<br />

redundancy, distant locations)<br />

17


dLab<br />

Basic functions <strong>and</strong> characteristics<br />

• General<br />

– Support digitisation activities<br />

– Management of the digitisation workflow<br />

– Monitor<strong>in</strong>g capability<br />

• Term<strong>in</strong>ology<br />

– Digitisation task<br />

• Basic element <strong>in</strong> the system, related to particular element, e.g. book, issue,<br />

etc.<br />

• Covers all the activities necessary to f<strong>in</strong>ish digitisation of particular element<br />

– Activity <strong>in</strong> frame of the digitisation task<br />

• Represents certa<strong>in</strong> work to be done, e.g. prepare optimized files<br />

• The work is done by human (user) or a mach<strong>in</strong>e (tool)<br />

• Flexibility<br />

– A set of activities <strong>in</strong> scope of a task<br />

– Order constra<strong>in</strong>ts<br />

– Pluggable architecture<br />

18


dLab – digitisation task<br />

Editor<br />

Scanner<br />

operator<br />

Tool<br />

QA<br />

Task A<br />

Select object<br />

to digitise<br />

Prepare<br />

master files<br />

Create PDF<br />

Archive<br />

master files<br />

Verify<br />

Submit PDF to<br />

digital library<br />

19


dLab – external tools<br />

dLab<br />

Plug<strong>in</strong>s <strong>for</strong> external tools<br />

dArceo<br />

F<strong>in</strong>eReader<br />

Document<br />

Express<br />

External tools<br />

dLibra<br />

plug<strong>in</strong><br />

dArceo<br />

plug<strong>in</strong><br />

FR plug<strong>in</strong><br />

DE plug<strong>in</strong><br />

dLab<br />

dLibra<br />

Activity<br />

A<br />

Activity<br />

B<br />

Activity<br />

C<br />

Activity<br />

…<br />

dLab UI<br />

Work<strong>in</strong>g space<br />

(e.g. disk array)


Summary (1)<br />

• Digitisation is an important mission of all <strong>cultural</strong><br />

heritage <strong>in</strong>stitutions<br />

• Long-<strong>term</strong> preservation is a key element of each digital<br />

library or museum<br />

• Mass digitisation needs a strong support <strong>in</strong> <strong>term</strong>s of<br />

management<br />

• Poznań Supercomput<strong>in</strong>g <strong>and</strong> Network<strong>in</strong>g Center acts<br />

as a support<strong>in</strong>g <strong>in</strong>stitution<br />

– R&D center (PAS) <strong>in</strong> the IT area, <strong>in</strong>clud<strong>in</strong>g digital libraries<br />

– Cooperates with various national <strong>and</strong> <strong>in</strong>ternational bodies<br />

to stimulate growth <strong>and</strong> advancements of <strong>in</strong><strong>for</strong>mation<br />

society<br />

21


Summary (2)<br />

Complex software solution <strong>for</strong> <strong>cultural</strong><br />

heritage <strong>in</strong>stitutions<br />

dLab<br />

dArceo<br />

master files<br />

<strong>long</strong>-<strong>term</strong> preservation<br />

digitisation process<br />

dLibra or<br />

dMuseion<br />

presentation files<br />

onl<strong>in</strong>e availability of resources<br />

22


Questions?<br />

Tomasz Parkoła<br />

tparkola@man.poznan.pl<br />

http://dl.psnc.pl<br />

23


Thank you!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!