10.07.2015 Views

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsProject Number:Project title:<strong>Deliverable</strong> Type:CEC <strong>Deliverable</strong> Number:2484203D VIVANTPublicIST-248420/<strong>Brunel</strong>/WP05/PU/R/Del-5-2Contractual Delivery Date: 31 th January 2013Actual Delivery Date: 4 th March 2013Title <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong>:Workpackage:Nature <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong>:Organisations:Search <strong>and</strong> Retrieval Mechanisms <strong>and</strong> ToolsWP5Report1 <strong>Brunel</strong> University2 Centre for Research <strong>and</strong> Technology Hellas – Informatics <strong>and</strong>Telematics Institute3 Institut für Rundfunktechnik GmbH4 Holografika5 RAI researchcentre6 Rundfunk Berlin-Br<strong>and</strong>enburg7 Instituto de Telecomunicações8 European Broadcast Union9 Arnold & Richter Cine TechnikAuthors:Circulation List:Theodoros Semertzidis, Vasilis Lovatsis, Michael. G. Strintzis,Petros DarasPartners <strong>and</strong> Public on InternetKeywords:search, content-based retrieval, holoscopic imaging4/03/2013 1


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsVersion ControlChange Log Version Date1 st draft(Iordanis Biperis - CERTH) 0.1 12/10/20102 nd draft (Thodoris Semertzidis - CERTH) 0.2 10/8/20123 rd draft (Vasilis Lovatsis – CERTH) 0.3 16/8/20124 nd draft (Thodoris Semertzidis – CERTH) 0.4 17/12/20125 rd version (Vasilis Lovatsis – CERTH) 0.5 04/3/20134/03/2013 2


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsTABLE OF CONTENTS1 INTRODUCTION ................................................................................................................................... 61.1 EXECUTIVE SUMMARY .......................................................................................................................... 61.2 DESIGN OBJECTIVES AND DOCUMENT STRUCTURE ....................................................................................... 72 FRAMEWORK ARCHITECTURE AND GRAPHICAL USER INTERFACE ......................................................... 83 DATABASE PREPARATION ................................................................................................................... 123.1 DATABASE STRUCTURE .............................................................................................................................. 123.2 DESCRIPTORS EXTRACTION ......................................................................................................................... 133.3 CODEBOOK GENERATION AND BAG-OF-WORDS .............................................................................................. 163.4 MANIFOLD LEARNING AND UNIFIED SEARCH SPACE .......................................................................................... 163.5 INDEX CREATION ...................................................................................................................................... 174 SEARCH FOR SIMILAR CONTENT ......................................................................................................... 184.1 SINGLE FRAME SEARCH .............................................................................................................................. 184.2 BATCH SEARCH TARGETING HYPERLINKER ENVIRONMENT .................................................................................. 215 IMPLEMENTATION DETAILS AND TOOLS ............................................................................................. 236 CONCLUSIONS .................................................................................................................................... 257 BIBLIOGRAPHY ................................................................................................................................... 264/03/2013 3


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsLIST OF FIGURESFigure 1: Search <strong>and</strong> Retrieval framework architecture ......................................................................... 8Figure 2: 3DVIVANT S&R framework starting GUI ............................................................................ 9Figure 3: The menu <strong>of</strong> Search & Retrieval s<strong>of</strong>tware ............................................................................ 10Figure 4: Dynamic behaviour <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework. ................................................. 12Figure 5: Database filesystem structure................................................................................................ 13Figure 6: Descriptors Extraction User Interface ................................................................................... 14Figure 7: descriptors extraction is a lengthy process. The console prints logs on <strong>the</strong> progress <strong>of</strong> <strong>the</strong>process .......................................................................................................................................... 15Figure 8: file format for <strong>the</strong> local features file ..................................................................................... 15Figure 9: <strong>the</strong> manifold learning process produces a single multimodal descriptor per object ............. 16Figure 10: kd-tree indexing structure ................................................................................................... 17Figure 11: Query formulator window ................................................................................................... 18Figure 12: segmentation mask is loaded <strong>and</strong> <strong>the</strong> user may click <strong>the</strong> object to use as query ................ 19Figure 13: results window with <strong>the</strong> retrieved similar objects. .............................................................. 20Figure 14: examine <strong>the</strong> 3D model from <strong>the</strong> relevant results ................................................................ 20Figure 15: XML schema for <strong>the</strong> output data, targeting <strong>the</strong> hyperlinker environment .......................... 224/03/2013 4


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsGLOSSARY2DTwo-dimensional3DThree-dimensionalGUIGraphical user interfaceBowBag <strong>of</strong> wordsS&RSearch <strong>and</strong> Retrieval4/03/2013 5


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools1 INTRODUCTION1.1 EXECUTIVE SUMMARYThe past decade has witnessed an exponential growth in digital multimedia production <strong>and</strong>communication. If nowadays a huge number <strong>of</strong> images <strong>and</strong> thous<strong>and</strong>s hours <strong>of</strong> video are created eachday by pr<strong>of</strong>essionals or home users, <strong>the</strong> establishment <strong>of</strong> <strong>the</strong> holoscopic imaging technology,advanced by 3D VIVANT, will lead literally to an explosion in <strong>the</strong> digital content production. Withsuch an increasing rate <strong>of</strong> content production, <strong>the</strong> development <strong>of</strong> effective <strong>and</strong> efficient contentbasedretrieval tools becomes <strong>of</strong> utmost importance.3D VIVANT aims to provide tools that will allow <strong>the</strong> retrieval <strong>of</strong> similar objects from holoscopiccontent databases. Given <strong>the</strong> inherent problems <strong>of</strong> current text-based search engines relying onsubjective manual or automatic annotation, 3D VIVANT opted for a search-by-example approach.The search <strong>and</strong> retrieval framework can be used from both <strong>the</strong> pr<strong>of</strong>essionals <strong>and</strong> home users.However, in <strong>the</strong> hyperlinker scenario only <strong>the</strong> pr<strong>of</strong>essional users will have access to <strong>the</strong> integral videoediting <strong>and</strong> thus also to <strong>the</strong> search <strong>and</strong> retrieval framework. Due to this fact, <strong>the</strong> emphasis in <strong>the</strong>Graphical User Interfaces was given with <strong>the</strong> pr<strong>of</strong>essional user in mind. For a home user, <strong>the</strong>difference in <strong>the</strong> GUI design would be to have less configuration options <strong>and</strong> emphasize <strong>the</strong> ease <strong>of</strong>use <strong>and</strong> not focus on <strong>the</strong> advanced control <strong>of</strong> <strong>the</strong> framework.The usage <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework will enable efficient editing <strong>of</strong> hyperlinked integralvideo <strong>and</strong> content reuse in different scenarios. Moreover, <strong>the</strong> tools <strong>and</strong> methodologies can be used by<strong>the</strong> home / amateur user to search inside large multimedia databases.To bridge <strong>the</strong> gap between conventional 2D <strong>and</strong> 3D technology with holoscopic imaging, 3DVIVANT develops content-based retrieval mechanisms that support multimodal queries, i.e. <strong>the</strong> useris able to pose holoscopic content queries <strong>and</strong> <strong>the</strong> framework will answer <strong>the</strong>se queries with similarmultimedia content from various modalities, such as: 2D images, full 3D models or range scans withor without texture information.In order to reduce <strong>the</strong> integration constraints <strong>and</strong> enable a modular usage <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware systemsproduced in 3D VIVANT, allow easy adaptation <strong>of</strong> <strong>the</strong> framework to <strong>the</strong> Content Editing Terminal<strong>and</strong> enable <strong>the</strong> usage <strong>of</strong> different PCs in editing <strong>and</strong> pre-processing <strong>of</strong> <strong>the</strong> content, <strong>the</strong> s<strong>of</strong>tware wasdeveloped as an autonomous system. In this approach <strong>the</strong> S&R framework takes <strong>the</strong> original integralvideo sequence <strong>and</strong> metadata <strong>and</strong> extracts an XML file to be used in <strong>the</strong> hyperlinker framework. Theselected approach enables integral video pre-processing in advance <strong>and</strong> distribution <strong>of</strong> <strong>the</strong>computational workload on different PCs. The hyperlinker environment is presented in deliverableD5.4[2], however, a short presentation <strong>of</strong> <strong>the</strong> usage <strong>of</strong> S&R framework with hyperlinker is discussedlater in this document.Finally, it is worth noting that <strong>the</strong> s<strong>of</strong>tware is based on open source libraries that can be compiled onvarious operating systems <strong>and</strong> it can be easily customized.4/03/2013 6


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools1.2 DESIGN OBJECTIVES AND DOCUMENT STRUCTUREThe document provides a detailed description <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework that wasdeveloped in <strong>the</strong> context <strong>of</strong> Task 5.1 under WP5. It is emphasized that this document presents <strong>the</strong>framework, i.e. <strong>the</strong> environment, mechanisms, tools <strong>and</strong> User Interfaces, while <strong>the</strong> actual search <strong>and</strong>retrieval algorithms are described in deliverable D5.1[1]. The search <strong>and</strong> retrieval framework is as<strong>of</strong>tware platform that provides <strong>the</strong> tools <strong>and</strong> user interfaces to: a) prepare <strong>the</strong> integral videosequences to be searchable b) to search inside a multimedia database for similar content c) to extractXML files to be used in <strong>the</strong> hyperlinker environment.This deliverable initially presents <strong>the</strong> overall architecture <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework <strong>and</strong><strong>the</strong>n separates <strong>the</strong> presentation <strong>of</strong> <strong>the</strong> framework in two fundamental sections: a) <strong>the</strong> databasepreparation <strong>and</strong> b) <strong>the</strong> actual search inside <strong>the</strong> database. The graphical User Interfaces (GUI) for eachfunctionality <strong>of</strong> <strong>the</strong> system are presented in-between <strong>the</strong> discussion <strong>of</strong> <strong>the</strong> components.The core <strong>of</strong> <strong>the</strong> system is <strong>the</strong> low-level feature extraction process, during whichmultimediadescriptors are extracted for each multimedia object in <strong>the</strong> database. Then a manifold learningalgorithm combines descriptors from different modalities to unify <strong>the</strong> search space <strong>and</strong> providemultimodal search capabilities.The descriptor extraction procedure is followed by a feature matching step, whose aim is to establisha similarity measure between any two objects. This step is highly dependent on <strong>the</strong> low-level featureextraction module <strong>and</strong> toge<strong>the</strong>r <strong>the</strong>y form <strong>the</strong> basis <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework.Finally, <strong>the</strong> search engine’s graphical user interface is extensively presented, enhanced with examplesthat demonstrate each step <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval procedures.The rest <strong>of</strong> <strong>the</strong> document is structured as follows: Section 2describes <strong>the</strong> framework’s architecture,basic modules <strong>and</strong> <strong>the</strong> menu items. Section 3 discusses <strong>the</strong> structure <strong>of</strong> <strong>the</strong> database <strong>and</strong> <strong>the</strong> actionsfor preparing <strong>the</strong> database for search queries. Section 4 presents <strong>the</strong> procedures for searching inside<strong>the</strong> database using two different approaches, while in section 5 <strong>the</strong> implementation details <strong>and</strong>programming tools are presented. Finally, in section 6 <strong>the</strong> conclusions <strong>of</strong> <strong>the</strong> document are drawn.4/03/2013 7


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools2 FRAMEWORK ARCHITECTURE AND GRAPHICAL USERINTERFACEThe search <strong>and</strong> retrieval framework was developed to have a modular <strong>and</strong> easily exp<strong>and</strong>able sourcecode for fur<strong>the</strong>r future improvements based on <strong>the</strong> upcoming requirements. With this concept inmind, most <strong>of</strong> <strong>the</strong> components <strong>of</strong> <strong>the</strong> framework are thread based classes that work independently <strong>of</strong><strong>the</strong> o<strong>the</strong>r classes <strong>and</strong> interconnect <strong>and</strong> exchange data through <strong>the</strong> application’s root process. Thisapproach enabled us to perform <strong>and</strong> test several configurations until <strong>the</strong> final decision. A roughschematic diagram <strong>of</strong> <strong>the</strong> framework’s architecture is presented in Figure 1.Graphical User InterfaceDescriptorextractor threadsDatabasethreadSearch process <strong>and</strong> resultspresentation threadLoggerthreadFigure 1: Search <strong>and</strong> Retrieval framework architectureThe descriptor extractor threads are triggered to extract one <strong>of</strong> <strong>the</strong> supported descriptoralgorithms in <strong>the</strong> selected database. After <strong>of</strong> <strong>the</strong> descriptor extraction process is finished <strong>the</strong>threads terminate. A new thread is started each time a new database should be processed.The database thread keeps track <strong>of</strong> <strong>the</strong> access to <strong>the</strong> filesystem database <strong>and</strong> mainly keeps inmemory <strong>the</strong> index <strong>of</strong> <strong>the</strong> database to be ready for search queries. The database thread lifespan is for <strong>the</strong> entire applications uptime <strong>and</strong> thus it is considered permanent thread.The search process thread is started each time a new search query is posed on <strong>the</strong> system.After processing <strong>the</strong> query, asking <strong>the</strong> database for similar content, <strong>and</strong> building <strong>the</strong> rankedlist <strong>of</strong> <strong>the</strong> results <strong>the</strong> thread terminates.Finally, <strong>the</strong> logger thread keeps track <strong>of</strong> all <strong>the</strong> actions that occur from <strong>the</strong> start to <strong>the</strong> end <strong>of</strong><strong>the</strong> application’s lifecycle <strong>and</strong> displays <strong>the</strong>m in a window <strong>of</strong> <strong>the</strong> UI.The Search <strong>and</strong> Retrieval Framework was developed as a multi-document interface (MDI)environment in order to support multiple functionalities in parallel. The startup UI is presented inFigure 2. At <strong>the</strong> top <strong>of</strong> <strong>the</strong> UI <strong>the</strong>re is a toolbar menu with all <strong>the</strong> available functionalities as well assome icon buttons for <strong>the</strong> most common features. The toolbar from left to right has a search menu, adatabase preparation menu, an options menu, a“window” tiling <strong>and</strong> navigation menu, <strong>and</strong> a “help”that navigates <strong>the</strong> user to <strong>the</strong> “about” window for information about <strong>the</strong> s<strong>of</strong>tware.4/03/2013 8


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 2: 3DVIVANT S&R framework starting GUIIn Figure 3 presents <strong>the</strong> exp<strong>and</strong>ed menu with all <strong>the</strong> available functionalities. Figure 3(a) presents <strong>the</strong>“Search” menu from where <strong>the</strong> user can start a new search process, open <strong>and</strong> examine <strong>the</strong> results <strong>of</strong> asearch process <strong>and</strong> save <strong>the</strong> results in a file. Figure 3(b) presents <strong>the</strong> two functionalities in preparing<strong>the</strong> database. The first is <strong>the</strong> descriptor extraction process which is discussed in details in section3.2<strong>and</strong> <strong>the</strong> creation or update <strong>of</strong> <strong>the</strong> index to enable fast search inside <strong>the</strong> database. In Figure 3(c) <strong>the</strong>options menu presents <strong>the</strong> two available options. The first is to open <strong>the</strong> settings UI <strong>and</strong> change <strong>the</strong>configuration <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware <strong>and</strong> <strong>the</strong> second option to open <strong>the</strong> terminal window where log messagesappear during each process. The “window” menu in Figure 3(d) enables <strong>the</strong> positioning <strong>and</strong>navigation through <strong>the</strong> available open windows. Finally, <strong>the</strong>“help” menu provides an “about” windowwith information on <strong>the</strong> s<strong>of</strong>tware <strong>and</strong> acknowledgement to 3DVIVANT project.(a)(b)4/03/2013 9


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools(c)(d)(e)Figure 3: The menu <strong>of</strong> Search & Retrieval s<strong>of</strong>twareFor <strong>the</strong> configuration <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware, a configuration file is available. The configuration file keepsfundamental information for <strong>the</strong> operation <strong>of</strong> <strong>the</strong> application <strong>and</strong> should be changed with care. Theconfiguration file can be accessed both from <strong>the</strong> filesystem <strong>and</strong> <strong>the</strong> application interface. ThevivantEngine.ini file is in <strong>the</strong> application folder.Figure 4 presents <strong>the</strong> settings window <strong>and</strong> <strong>the</strong>corresponding vivantEngine.ini file.Figure 1(a)(b)Figure 4: (a) settings window (b) vivantEngine.ini file containing <strong>the</strong> same info with settings window4/03/2013 10


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsThe configuration file contains <strong>the</strong> following information:Latest window positionWindow sizeDatabase pathJava pathBinary files pathSelected default descriptorBatchMode flagFrom <strong>the</strong> above settings <strong>the</strong> database path is <strong>the</strong> most critical one since it points to <strong>the</strong> database towhich all operations will be applied. Java path parameter is used if CEDD descriptors are selectedsince <strong>the</strong> implementation is provided as a jar file. The binary files path contain o<strong>the</strong>r helperapplications for <strong>the</strong> processing <strong>of</strong> <strong>the</strong> data. Finally <strong>the</strong> batch mode flag is used to process batch data<strong>and</strong> extract <strong>the</strong> XML files for <strong>the</strong> hyperlinker.4/03/2013 11


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools3 DATABASE PREPARATIONA fundamental step in <strong>the</strong> preparation <strong>of</strong> a multimedia database in order to be searchable is to extractdescriptor vectors that represent each multimedia object. In <strong>the</strong> case <strong>of</strong> 3DVIVANT S&R frameworkthis also st<strong>and</strong>s true. This section describes <strong>the</strong> database structure, <strong>the</strong> descriptors extraction process<strong>and</strong> <strong>the</strong> final indexing structure creation to enable multimedia similarity search. This process is alsoknown as <strong>of</strong>fline process as it is depicted in Figure 5.OfflineContentRepositoryPre-ProcessingLow-level FeatureExtractionDescriptorsBrowsing / Selectingexisting objectFeatureRetrievalFeatureMatchingResultsSubmitting newobjectPre-ProcessingLow-level FeatureExtractionOnlineFigure 5: Dynamic behaviour <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework.The steps <strong>of</strong> <strong>the</strong> <strong>of</strong>fline process are discussed in <strong>the</strong> following subsections. Section 3.1 refers to <strong>the</strong>database structure, section 3.2 describes <strong>the</strong> descriptors extraction process <strong>and</strong> <strong>the</strong> user’s options,section 3.3 discusses <strong>the</strong> codebook generation process <strong>and</strong> <strong>the</strong> extracted histograms (bag <strong>of</strong> words)<strong>and</strong> finally section 3.5 describes <strong>the</strong> creation <strong>of</strong> indexes for fast similarity search inside <strong>the</strong> database.3.1 DATABASE STRUCTUREIn 3DVIVANT <strong>the</strong> database is actually a filesystem structure that contains all <strong>the</strong> necessary data <strong>and</strong>metadata. The structure aims to organize <strong>the</strong> data in a) <strong>the</strong> raw multimedia files b) <strong>the</strong> features files c)<strong>the</strong> bag-<strong>of</strong>-words files d) <strong>and</strong> finally <strong>the</strong> index files. The filesystem structure is presented in Figure 6.In <strong>the</strong> “data” directory <strong>of</strong> <strong>the</strong> filesystem, <strong>the</strong> raw data are separated based on <strong>the</strong> modality in order todrive <strong>the</strong> appropriate descriptor extraction algorithms properly for each modality. Moreover, <strong>the</strong>proposed structure enables easier reference <strong>of</strong> <strong>the</strong> multimedia objects in <strong>the</strong> retrieval process.The decision to select <strong>the</strong> S&R database to be a filesystem structure is supported by variousarguments. First, a filesystem database is very easy to create even for a novice user. The possibility4/03/2013 12


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Toolsfor a user to have direct access to <strong>the</strong> actual content provides many advantages in a process <strong>of</strong> a videoediting scenario such as <strong>the</strong> “hyperlinker scenario”. Moreover, <strong>the</strong> S&R framework aims to be usedalso by novice/home users that need to search inside <strong>the</strong>ir multimedia content, that most <strong>of</strong> <strong>the</strong> timeisstored inside a folder on <strong>the</strong>ir hard disk drive.Figure 6: Database filesystem structure3.2 DESCRIPTORS EXTRACTIONThe selected descriptor extraction algorithms run through <strong>the</strong> “data” directory in <strong>the</strong> appropriatemodality (viewpoint images, depth maps, curvature images, 2D snapshots etc.) <strong>and</strong> extract onefeatures’ file for each multimedia file. For each supported descriptor <strong>the</strong>re is a directory with <strong>the</strong> same4/03/2013 13


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Toolsname under <strong>the</strong> “descr” folder. Moreover, under each descriptor folder <strong>the</strong>re are two subfolders (seeFigure 6): one for <strong>the</strong> extracted features (“features” folder) <strong>and</strong> one for <strong>the</strong> bag-<strong>of</strong>-words vector(“words” folder) that will be generated for each feature file, using a generated codebook (see section3.3).The Search <strong>and</strong> Retrieval Framework supports <strong>the</strong> following descriptor extraction algorithms for <strong>the</strong>extraction <strong>of</strong> local features <strong>of</strong> <strong>the</strong> multimedia objects:1. CSIFT: Colour SIFT2. OSIFT: Opponent SIFT. A Colour SIFT variant where a different colour model is used.3. GB: Geometric Blur4. SSIM: Self Similarity5. PI: Projection Images. Applicable only in depth dataThe User Interface for extracting descriptors (see Figure 7) enables <strong>the</strong> user to select whe<strong>the</strong>r <strong>the</strong>system will generate descriptors for one descriptor algorithm or for all <strong>of</strong> <strong>the</strong>m. Based on <strong>the</strong> selection<strong>of</strong> <strong>the</strong> user, <strong>the</strong> system triggers one or multiple threads that extract descriptors for <strong>the</strong> same raw data(under “data” directory) in parallel. From this step until <strong>the</strong> finish <strong>of</strong> <strong>the</strong> final descriptors <strong>the</strong> systemworks without <strong>the</strong> need for user intervention. The extraction <strong>of</strong> <strong>the</strong> local descriptors is a lengthyprocess <strong>and</strong> thus a progress bar informs <strong>the</strong> user for <strong>the</strong> progress <strong>of</strong> <strong>the</strong> computations (Figure 8).Moreover, <strong>the</strong> console window prints log messages to give more information on <strong>the</strong> runningprocesses.Figure 7: Descriptors Extraction User Interface4/03/2013 14


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 8: Descriptor extraction is a lengthy process. The console prints logs on <strong>the</strong> progress <strong>of</strong> <strong>the</strong> processAs it was stated above, each descriptor extraction algorithm generates its own local feature vectors<strong>and</strong> stores <strong>the</strong>m under <strong>the</strong> “descr” directory in <strong>the</strong> corresponding name. To unify <strong>the</strong> process <strong>of</strong> datastorage <strong>and</strong> ease <strong>the</strong> Input /Output processes we chose to have a unique format for all <strong>the</strong> availablelocal features files. The format include <strong>the</strong> algorithm name, <strong>the</strong> dimensions <strong>of</strong> <strong>the</strong> feature vector, <strong>the</strong>number <strong>of</strong> <strong>the</strong> local features in <strong>the</strong> file <strong>and</strong> <strong>the</strong>n one vector per line along with <strong>the</strong> positioninginformation for each local feature vector. The file format is presented in Figure 9.Figure 9: file format for <strong>the</strong> local features fileWhen <strong>the</strong> descriptor extraction processes finish, <strong>the</strong> bag-<strong>of</strong>-words histogram should be computed. Tocalculate this final histogram a codebook is needed. Section 3.3 discusses this process.4/03/2013 15


ICTProjectContract no.:2484203D VIVANT–<strong>Deliverable</strong><strong>5.2</strong>Search& Retrieval Mechanismss &Tools3.3CODEBOOK GENERATION AND BAG-OF-WORDSSince most <strong>of</strong> <strong>the</strong> well performing descriptor extraction algorithms extract local features, <strong>the</strong>re aremany feature vectors that correspondtoa single multimedia object. The local features for eachmultimedia object may be hundreds or thous<strong>and</strong>s <strong>and</strong>thus it is very difficult to use in an efficient wayfor <strong>the</strong> matchingstep. A common approach to this problem is <strong>the</strong>bag-<strong>of</strong>-words (BoW) method. With<strong>the</strong> BoW, a codebook with <strong>the</strong> most dominant code words (quantized descriptor vectors) is generated.The codebook isgenerated using a k-means algorithmwith a typical k <strong>of</strong> 5000 to 1000 dimensions <strong>and</strong>by sampling <strong>the</strong> local features available in <strong>the</strong> database.Theneach set <strong>of</strong> local features is checked against <strong>the</strong> codebook to generate a histogram <strong>of</strong>occurrences <strong>of</strong> code words for each features’ file (which correspond to one multimedia object). Thefinalhistogram <strong>of</strong> <strong>the</strong> BoWis actually a new descriptor that represents <strong>the</strong>multimediaobject withonlyone descriptor vector. These final BoW descriptor vectors are stored in<strong>the</strong> database under <strong>the</strong>“words” directories.3.4MANIFOLD LEARNING AND UNIFIED SEARCH SPACEThe next step after <strong>the</strong> generation <strong>of</strong> <strong>the</strong> bag-<strong>of</strong>-words is <strong>the</strong> creation <strong>of</strong> <strong>the</strong> unified search space as itwas described in details in<strong>Deliverable</strong> D5.1. Forthis <strong>the</strong> manifold learning process starts <strong>and</strong>combines <strong>the</strong> different modalities (viewpoints, depth, curvature) to build a unified vector space.Thethatfinal outcome <strong>of</strong> <strong>the</strong> process is a single descriptor vector (see Figure 10) per multimedia objectcorrespondss to <strong>the</strong> unified multimodal search space.Figure 10: <strong>the</strong> manifold learning process produces a single multimodal descriptor per objectThisfinal singledescriptor vector per multimedia object is <strong>the</strong>nindexed to prepare <strong>the</strong> database forsimilarity searchqueries.4/03/201316


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools3.5 INDEX CREATIONIt is very common in a search environment to use indexing structures to enable faster similaritysearch. For <strong>the</strong> 3DVIVANT search <strong>and</strong> retrieval framework <strong>the</strong> kd-tree indexing structure (see Figure11) was selected. The kd-tree builds a tree data structure (or a number <strong>of</strong> trees – kd-tree forest) where<strong>the</strong> most similar descriptor vectors are close toge<strong>the</strong>r. By using such a structure, <strong>the</strong> exhaustive linearsearch is avoided <strong>and</strong> thus <strong>the</strong> framework answers faster in similarity search queries.List <strong>of</strong>Space partitiona: [v a,1v a,2...v a,n]b: [v b,1v b,2...v b,n]⁞k-D12 34 5 6a b c d e f gFigure 11: kd-tree indexing structureIn <strong>the</strong> cases that <strong>the</strong> database is small or <strong>the</strong>re are no external-to-<strong>the</strong>-database queries <strong>the</strong> kd-treeindexer may not be used. In that case <strong>the</strong> engine can pre-compute <strong>the</strong> search results for each record in<strong>the</strong> database <strong>and</strong> store <strong>the</strong>se results for future queries.4/03/2013 17


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools4 SEARCH FOR SIMILAR CONTENTIn <strong>the</strong>online phase <strong>of</strong> <strong>the</strong> framework, <strong>the</strong> database should be already prepared <strong>and</strong> waiting for queries.The framework supports two kinds <strong>of</strong> querying methods. The first is <strong>the</strong> typical method <strong>of</strong> a userposing a query multimedia object (a single query – see section 4.1) <strong>and</strong> <strong>the</strong> system replies with <strong>the</strong>results list. The second method is a “batch search” method where a user may need to pre-compute <strong>the</strong>results for a set <strong>of</strong> queries to use <strong>the</strong>m in <strong>the</strong> future or outside <strong>the</strong> framework (section 4.2). Thissecond approach is used in <strong>the</strong> hyperlinker demo process. This will be presented in detail in<strong>Deliverable</strong> D5.44.1 SINGLE FRAME SEARCHIn <strong>the</strong> single frame search functionality <strong>the</strong> user selects to perform a new query from <strong>the</strong> menu or <strong>the</strong>toolbar icon. A new search thread is generated <strong>and</strong> a new window is presented in <strong>the</strong> UI (Figure 12).The window provides an “open” button that opens <strong>the</strong> platform’s native “file open” dialog. The userselects <strong>the</strong> integral frame file that he wants to use as query <strong>and</strong> loads it into <strong>the</strong> window. The “loadmask” button loads <strong>the</strong> corresponding segmentation mask from <strong>the</strong> database. The segmentation maskloading opens <strong>the</strong> mask image file <strong>and</strong> reads from <strong>the</strong> corresponding XML file or computes on <strong>the</strong> fly<strong>the</strong> bounding boxes <strong>and</strong> clickable areas for each object in <strong>the</strong> frame. After this process <strong>the</strong> user is ableto click <strong>the</strong> exact object (see Figure 13) in <strong>the</strong> loaded integral image <strong>and</strong> use it as a query. In this case<strong>the</strong> query is only <strong>the</strong> selected object <strong>and</strong> not <strong>the</strong> entire query image. If <strong>the</strong> user selects to not load <strong>the</strong>mask <strong>and</strong> just click <strong>the</strong> search button, <strong>the</strong> whole integral image is used as a query image.Figure 12: Query formulator window4/03/2013 18


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 13: segmentation mask is loaded <strong>and</strong> <strong>the</strong> user may click <strong>the</strong> object to use as queryAfter <strong>the</strong> query is submitted with ei<strong>the</strong>r <strong>of</strong> <strong>the</strong> two methods, <strong>the</strong> search process is started. Thedescriptor extractor process is started <strong>and</strong> <strong>the</strong> descriptors are generated. Then <strong>the</strong> manifold learningalgorithm is called to project <strong>the</strong> set <strong>of</strong> selected descriptors to <strong>the</strong> multimodal space. This finaldescriptor vector is used to query <strong>the</strong> index for similar content.The final results <strong>of</strong> <strong>the</strong> process appear in a new window (see Figure 14) that displays a list <strong>of</strong> objectsfrom <strong>the</strong> most to <strong>the</strong> least similar one, using both a central viewpoint image <strong>and</strong> <strong>the</strong> original modalityobject (e.g. integral image or 3D model etc.).4/03/2013 19


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 14: Results window with <strong>the</strong> retrieved similar objects.In <strong>the</strong> case that one <strong>of</strong> <strong>the</strong> retrieved results is a 3D model, <strong>the</strong> user can click <strong>and</strong> select <strong>the</strong> object for adetailed examination. In this case a new window pops up <strong>and</strong> <strong>the</strong> 3D model is loaded in a 3Drendering environment. The environment provides easy 3D model manipulation using only mousegestures. Figure 15 depicts this 3D rendering window with <strong>the</strong> 3D manipulator UI appear in red lineon top <strong>of</strong> <strong>the</strong> 3D model.Figure 15: Examining <strong>the</strong> 3D model from <strong>the</strong> relevant results4/03/2013 20


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools4.2 BATCH SEARCH TARGETING HYPERLINKER ENVIRONMENTThe batch search method is targeting <strong>the</strong> hyperlinker environment. The goal <strong>of</strong> this feature is to preprocessan integral video sequence (<strong>the</strong> raw data database) <strong>and</strong> generate <strong>the</strong> whole set <strong>of</strong> ranked listsfor <strong>the</strong> available clickable objects in each frame. The output <strong>of</strong> <strong>the</strong> process is one XML file for eachintegral video frame. The XML file along with <strong>the</strong> corresponding video frame, are used as input to <strong>the</strong>Hyperlinking environment tool, to enable <strong>the</strong> tool to present search results for <strong>the</strong> clickable objects.These results are presented to <strong>the</strong> user / editor to select possible hyperlinks <strong>of</strong> <strong>the</strong> object to ano<strong>the</strong>rmultimedia object or just a web URI.In <strong>the</strong> batch search approach <strong>the</strong> user should feed <strong>the</strong> engine with all <strong>the</strong> necessary raw data (integralimages, central viewpoints etc.) <strong>and</strong> segmentation masks <strong>and</strong> <strong>the</strong> appropriate XML files providedwith <strong>the</strong> segmented masks for each frame <strong>of</strong> a video sequence.The batch process is triggered by setting batchMode=yes in <strong>the</strong> configuration file <strong>and</strong> <strong>the</strong>n clickingbuild index button. By selecting to process <strong>the</strong> sequence in batch mode <strong>the</strong> engine generates an all-toallsearch matrix. The engine uses all <strong>the</strong> database records, one at a time, as a query object to findsimilar objects inside <strong>the</strong> database. For each one <strong>of</strong> <strong>the</strong>m a ranked list is generated that references <strong>the</strong>similar objects <strong>of</strong> <strong>the</strong> database.The difference in this mode is that <strong>the</strong> user does not need to select <strong>and</strong> perform search for all <strong>the</strong>integral frames by h<strong>and</strong>. The automated process enables faster <strong>and</strong> less error prone results.Since <strong>the</strong> batch mode is used in <strong>the</strong> hyperlinker demo workflow <strong>the</strong> appropriate XML file for eachranked list is generated to be used as input to <strong>the</strong> hyperlinker environment. The XML files containinformation for <strong>the</strong> bounding boxes <strong>of</strong> <strong>the</strong> segmented objects for each frame <strong>and</strong> <strong>the</strong> search results <strong>of</strong>each bounding box if available. The schema serves as a general metadata exchange schema between<strong>the</strong> hyperlinker environment <strong>and</strong> peripheral tools so <strong>the</strong> search results are optional for each boundingbox. Figure 16 presents <strong>the</strong> XML schema <strong>of</strong> <strong>the</strong> generated files.4/03/2013 21


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 16: XML schema for <strong>the</strong> output data, targeting <strong>the</strong> hyperlinker environment4/03/2013 22


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools5 IMPLEMENTATION DETAILS AND TOOLSThe search <strong>and</strong> retrieval framework integrates various tools <strong>and</strong> technologies in order to fulfil <strong>the</strong>required functionalities.The S<strong>of</strong>tware is implemented in C++ using <strong>the</strong> Qt 4.7.3 framework [3], <strong>the</strong> opencv (Open ComputerVision) library[4] <strong>and</strong> GLC library for 3D rendering [5]. The s<strong>of</strong>tware was compiled with MS VisualStudio 2008 compiler <strong>and</strong> tested on Windows XP <strong>and</strong> Windows 7 platforms.Qt frameworkQt is a cross platform application <strong>and</strong> UI framework that provides a broad set <strong>of</strong> tools for fast <strong>and</strong>efficient coding. The most important libraries are:Qt Core, which contains all <strong>the</strong> necessary classes for a core application. It is <strong>the</strong> foundation<strong>of</strong> all Qt-based applications. The key functions are: File I/O, objects h<strong>and</strong>ling, multithreading<strong>and</strong> concurrency, plugins <strong>and</strong> settings management <strong>and</strong> finally signals <strong>and</strong> slots interobjectcommunications.Qt GUI module contains <strong>the</strong> functionality needed to develop advanced UI applications. Animportant feature <strong>of</strong> <strong>the</strong> module is that it uses <strong>the</strong> native graphics API <strong>of</strong> each platform itsupports, taking full advantage <strong>of</strong> <strong>the</strong> system resources.Except from <strong>the</strong>se basic library modules, Qt includes also <strong>the</strong> following libraries:1. 2D graphics Canvas2. OpenGL, which was used with GLC library3. WebKit, which was used for <strong>the</strong> rendering <strong>of</strong> <strong>the</strong> hyperlinked frames from <strong>the</strong> integralsequences4. Scripting5. Multimedia6. Networking7. XML8. Database9. Unit Testing10. Declarative UIs for touch enables <strong>and</strong> embedded devicesOpenCV libraryOpenCV is a well-known, open source, Computer Vision library with a large community <strong>of</strong>developers <strong>and</strong> users. OpenCV provides a set <strong>of</strong> image I/O, image processing, computer vision <strong>and</strong>machine intelligence algorithms that are well documented <strong>and</strong> thoroughly tested from <strong>the</strong> users’community.GLC libraryGLC library provides input/output functionality for most <strong>of</strong> <strong>the</strong> well-known 3D data formats such as:4/03/2013 23


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsCollada V1.4, 3DXML ASCII V3 <strong>and</strong> V4, OBJ, 3DS, STL (ASCII <strong>and</strong> binary), OFF <strong>and</strong> COFF.Moreover, GLC provides easy mechanisms for OpenGL based 3D rendering inside a Qt windowenvironment. A very useful feature <strong>of</strong> <strong>the</strong> library is <strong>the</strong> build-in 3D view manipulator that gives anintuitive UI for <strong>the</strong> user.4/03/2013 24


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools6 CONCLUSIONSThis document presents <strong>the</strong> 3D VIVANT Search <strong>and</strong> Retrieval framework. It presents <strong>the</strong> actuals<strong>of</strong>tware <strong>and</strong> <strong>the</strong> User Interfaces that incorporate <strong>the</strong> search <strong>and</strong> retrieval algorithms described indeliverable D5.1.The deliverable begins with a report on <strong>the</strong> overall User Interfaces <strong>and</strong> <strong>the</strong> menufunctionalities. Then <strong>the</strong> following sections describe <strong>the</strong> <strong>of</strong>fline <strong>and</strong> online processes <strong>of</strong> <strong>the</strong>multimedia similarity search concept <strong>and</strong> present in detail parts <strong>of</strong> <strong>the</strong> UI for each function. Thedeliverable describes in depth <strong>the</strong> database structure that <strong>the</strong> S&R engine uses <strong>and</strong> <strong>the</strong> file formats forall <strong>the</strong> types <strong>of</strong> I/O files used. Finally, section 5 discusses <strong>the</strong> s<strong>of</strong>tware tools <strong>and</strong> implementationdetails.The 3DVIVANT search <strong>and</strong> retrieval framework provides <strong>the</strong> tools <strong>and</strong> functionality for efficientlycreate a multimodal multimedia database <strong>and</strong> perform multimodal queries. The multimodal approachenables <strong>the</strong> search for similar content using any available modality e.g. search for 3Dmodels but useintegral images as queries.4/03/2013 25


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools7 BIBLIOGRAPHY[1]. 3DVivant deliverable D5.1[2]. 3DVivant deliverable D5.4[3]. http://qt.nokia.com/[4]. http://opencv.willowgarage.com/wiki/[5]. http://www.glc-lib.net/4/03/2013 26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!