10.07.2015 Views

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

Deliverable 5.2 - the School of Engineering and Design - Brunel ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsProject Number:Project title:<strong>Deliverable</strong> Type:CEC <strong>Deliverable</strong> Number:2484203D VIVANTPublicIST-248420/<strong>Brunel</strong>/WP05/PU/R/Del-5-2Contractual Delivery Date: 31 th January 2013Actual Delivery Date: 4 th March 2013Title <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong>:Workpackage:Nature <strong>of</strong> <strong>the</strong> <strong>Deliverable</strong>:Organisations:Search <strong>and</strong> Retrieval Mechanisms <strong>and</strong> ToolsWP5Report1 <strong>Brunel</strong> University2 Centre for Research <strong>and</strong> Technology Hellas – Informatics <strong>and</strong>Telematics Institute3 Institut für Rundfunktechnik GmbH4 Holografika5 RAI researchcentre6 Rundfunk Berlin-Br<strong>and</strong>enburg7 Instituto de Telecomunicações8 European Broadcast Union9 Arnold & Richter Cine TechnikAuthors:Circulation List:Theodoros Semertzidis, Vasilis Lovatsis, Michael. G. Strintzis,Petros DarasPartners <strong>and</strong> Public on InternetKeywords:search, content-based retrieval, holoscopic imaging4/03/2013 1


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsVersion ControlChange Log Version Date1 st draft(Iordanis Biperis - CERTH) 0.1 12/10/20102 nd draft (Thodoris Semertzidis - CERTH) 0.2 10/8/20123 rd draft (Vasilis Lovatsis – CERTH) 0.3 16/8/20124 nd draft (Thodoris Semertzidis – CERTH) 0.4 17/12/20125 rd version (Vasilis Lovatsis – CERTH) 0.5 04/3/20134/03/2013 2


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsTABLE OF CONTENTS1 INTRODUCTION ................................................................................................................................... 61.1 EXECUTIVE SUMMARY .......................................................................................................................... 61.2 DESIGN OBJECTIVES AND DOCUMENT STRUCTURE ....................................................................................... 72 FRAMEWORK ARCHITECTURE AND GRAPHICAL USER INTERFACE ......................................................... 83 DATABASE PREPARATION ................................................................................................................... 123.1 DATABASE STRUCTURE .............................................................................................................................. 123.2 DESCRIPTORS EXTRACTION ......................................................................................................................... 133.3 CODEBOOK GENERATION AND BAG-OF-WORDS .............................................................................................. 163.4 MANIFOLD LEARNING AND UNIFIED SEARCH SPACE .......................................................................................... 163.5 INDEX CREATION ...................................................................................................................................... 174 SEARCH FOR SIMILAR CONTENT ......................................................................................................... 184.1 SINGLE FRAME SEARCH .............................................................................................................................. 184.2 BATCH SEARCH TARGETING HYPERLINKER ENVIRONMENT .................................................................................. 215 IMPLEMENTATION DETAILS AND TOOLS ............................................................................................. 236 CONCLUSIONS .................................................................................................................................... 257 BIBLIOGRAPHY ................................................................................................................................... 264/03/2013 3


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsLIST OF FIGURESFigure 1: Search <strong>and</strong> Retrieval framework architecture ......................................................................... 8Figure 2: 3DVIVANT S&R framework starting GUI ............................................................................ 9Figure 3: The menu <strong>of</strong> Search & Retrieval s<strong>of</strong>tware ............................................................................ 10Figure 4: Dynamic behaviour <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework. ................................................. 12Figure 5: Database filesystem structure................................................................................................ 13Figure 6: Descriptors Extraction User Interface ................................................................................... 14Figure 7: descriptors extraction is a lengthy process. The console prints logs on <strong>the</strong> progress <strong>of</strong> <strong>the</strong>process .......................................................................................................................................... 15Figure 8: file format for <strong>the</strong> local features file ..................................................................................... 15Figure 9: <strong>the</strong> manifold learning process produces a single multimodal descriptor per object ............. 16Figure 10: kd-tree indexing structure ................................................................................................... 17Figure 11: Query formulator window ................................................................................................... 18Figure 12: segmentation mask is loaded <strong>and</strong> <strong>the</strong> user may click <strong>the</strong> object to use as query ................ 19Figure 13: results window with <strong>the</strong> retrieved similar objects. .............................................................. 20Figure 14: examine <strong>the</strong> 3D model from <strong>the</strong> relevant results ................................................................ 20Figure 15: XML schema for <strong>the</strong> output data, targeting <strong>the</strong> hyperlinker environment .......................... 224/03/2013 4


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsGLOSSARY2DTwo-dimensional3DThree-dimensionalGUIGraphical user interfaceBowBag <strong>of</strong> wordsS&RSearch <strong>and</strong> Retrieval4/03/2013 5


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools1 INTRODUCTION1.1 EXECUTIVE SUMMARYThe past decade has witnessed an exponential growth in digital multimedia production <strong>and</strong>communication. If nowadays a huge number <strong>of</strong> images <strong>and</strong> thous<strong>and</strong>s hours <strong>of</strong> video are created eachday by pr<strong>of</strong>essionals or home users, <strong>the</strong> establishment <strong>of</strong> <strong>the</strong> holoscopic imaging technology,advanced by 3D VIVANT, will lead literally to an explosion in <strong>the</strong> digital content production. Withsuch an increasing rate <strong>of</strong> content production, <strong>the</strong> development <strong>of</strong> effective <strong>and</strong> efficient contentbasedretrieval tools becomes <strong>of</strong> utmost importance.3D VIVANT aims to provide tools that will allow <strong>the</strong> retrieval <strong>of</strong> similar objects from holoscopiccontent databases. Given <strong>the</strong> inherent problems <strong>of</strong> current text-based search engines relying onsubjective manual or automatic annotation, 3D VIVANT opted for a search-by-example approach.The search <strong>and</strong> retrieval framework can be used from both <strong>the</strong> pr<strong>of</strong>essionals <strong>and</strong> home users.However, in <strong>the</strong> hyperlinker scenario only <strong>the</strong> pr<strong>of</strong>essional users will have access to <strong>the</strong> integral videoediting <strong>and</strong> thus also to <strong>the</strong> search <strong>and</strong> retrieval framework. Due to this fact, <strong>the</strong> emphasis in <strong>the</strong>Graphical User Interfaces was given with <strong>the</strong> pr<strong>of</strong>essional user in mind. For a home user, <strong>the</strong>difference in <strong>the</strong> GUI design would be to have less configuration options <strong>and</strong> emphasize <strong>the</strong> ease <strong>of</strong>use <strong>and</strong> not focus on <strong>the</strong> advanced control <strong>of</strong> <strong>the</strong> framework.The usage <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework will enable efficient editing <strong>of</strong> hyperlinked integralvideo <strong>and</strong> content reuse in different scenarios. Moreover, <strong>the</strong> tools <strong>and</strong> methodologies can be used by<strong>the</strong> home / amateur user to search inside large multimedia databases.To bridge <strong>the</strong> gap between conventional 2D <strong>and</strong> 3D technology with holoscopic imaging, 3DVIVANT develops content-based retrieval mechanisms that support multimodal queries, i.e. <strong>the</strong> useris able to pose holoscopic content queries <strong>and</strong> <strong>the</strong> framework will answer <strong>the</strong>se queries with similarmultimedia content from various modalities, such as: 2D images, full 3D models or range scans withor without texture information.In order to reduce <strong>the</strong> integration constraints <strong>and</strong> enable a modular usage <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware systemsproduced in 3D VIVANT, allow easy adaptation <strong>of</strong> <strong>the</strong> framework to <strong>the</strong> Content Editing Terminal<strong>and</strong> enable <strong>the</strong> usage <strong>of</strong> different PCs in editing <strong>and</strong> pre-processing <strong>of</strong> <strong>the</strong> content, <strong>the</strong> s<strong>of</strong>tware wasdeveloped as an autonomous system. In this approach <strong>the</strong> S&R framework takes <strong>the</strong> original integralvideo sequence <strong>and</strong> metadata <strong>and</strong> extracts an XML file to be used in <strong>the</strong> hyperlinker framework. Theselected approach enables integral video pre-processing in advance <strong>and</strong> distribution <strong>of</strong> <strong>the</strong>computational workload on different PCs. The hyperlinker environment is presented in deliverableD5.4[2], however, a short presentation <strong>of</strong> <strong>the</strong> usage <strong>of</strong> S&R framework with hyperlinker is discussedlater in this document.Finally, it is worth noting that <strong>the</strong> s<strong>of</strong>tware is based on open source libraries that can be compiled onvarious operating systems <strong>and</strong> it can be easily customized.4/03/2013 6


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools1.2 DESIGN OBJECTIVES AND DOCUMENT STRUCTUREThe document provides a detailed description <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework that wasdeveloped in <strong>the</strong> context <strong>of</strong> Task 5.1 under WP5. It is emphasized that this document presents <strong>the</strong>framework, i.e. <strong>the</strong> environment, mechanisms, tools <strong>and</strong> User Interfaces, while <strong>the</strong> actual search <strong>and</strong>retrieval algorithms are described in deliverable D5.1[1]. The search <strong>and</strong> retrieval framework is as<strong>of</strong>tware platform that provides <strong>the</strong> tools <strong>and</strong> user interfaces to: a) prepare <strong>the</strong> integral videosequences to be searchable b) to search inside a multimedia database for similar content c) to extractXML files to be used in <strong>the</strong> hyperlinker environment.This deliverable initially presents <strong>the</strong> overall architecture <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework <strong>and</strong><strong>the</strong>n separates <strong>the</strong> presentation <strong>of</strong> <strong>the</strong> framework in two fundamental sections: a) <strong>the</strong> databasepreparation <strong>and</strong> b) <strong>the</strong> actual search inside <strong>the</strong> database. The graphical User Interfaces (GUI) for eachfunctionality <strong>of</strong> <strong>the</strong> system are presented in-between <strong>the</strong> discussion <strong>of</strong> <strong>the</strong> components.The core <strong>of</strong> <strong>the</strong> system is <strong>the</strong> low-level feature extraction process, during whichmultimediadescriptors are extracted for each multimedia object in <strong>the</strong> database. Then a manifold learningalgorithm combines descriptors from different modalities to unify <strong>the</strong> search space <strong>and</strong> providemultimodal search capabilities.The descriptor extraction procedure is followed by a feature matching step, whose aim is to establisha similarity measure between any two objects. This step is highly dependent on <strong>the</strong> low-level featureextraction module <strong>and</strong> toge<strong>the</strong>r <strong>the</strong>y form <strong>the</strong> basis <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework.Finally, <strong>the</strong> search engine’s graphical user interface is extensively presented, enhanced with examplesthat demonstrate each step <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval procedures.The rest <strong>of</strong> <strong>the</strong> document is structured as follows: Section 2describes <strong>the</strong> framework’s architecture,basic modules <strong>and</strong> <strong>the</strong> menu items. Section 3 discusses <strong>the</strong> structure <strong>of</strong> <strong>the</strong> database <strong>and</strong> <strong>the</strong> actionsfor preparing <strong>the</strong> database for search queries. Section 4 presents <strong>the</strong> procedures for searching inside<strong>the</strong> database using two different approaches, while in section 5 <strong>the</strong> implementation details <strong>and</strong>programming tools are presented. Finally, in section 6 <strong>the</strong> conclusions <strong>of</strong> <strong>the</strong> document are drawn.4/03/2013 7


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools2 FRAMEWORK ARCHITECTURE AND GRAPHICAL USERINTERFACEThe search <strong>and</strong> retrieval framework was developed to have a modular <strong>and</strong> easily exp<strong>and</strong>able sourcecode for fur<strong>the</strong>r future improvements based on <strong>the</strong> upcoming requirements. With this concept inmind, most <strong>of</strong> <strong>the</strong> components <strong>of</strong> <strong>the</strong> framework are thread based classes that work independently <strong>of</strong><strong>the</strong> o<strong>the</strong>r classes <strong>and</strong> interconnect <strong>and</strong> exchange data through <strong>the</strong> application’s root process. Thisapproach enabled us to perform <strong>and</strong> test several configurations until <strong>the</strong> final decision. A roughschematic diagram <strong>of</strong> <strong>the</strong> framework’s architecture is presented in Figure 1.Graphical User InterfaceDescriptorextractor threadsDatabasethreadSearch process <strong>and</strong> resultspresentation threadLoggerthreadFigure 1: Search <strong>and</strong> Retrieval framework architectureThe descriptor extractor threads are triggered to extract one <strong>of</strong> <strong>the</strong> supported descriptoralgorithms in <strong>the</strong> selected database. After <strong>of</strong> <strong>the</strong> descriptor extraction process is finished <strong>the</strong>threads terminate. A new thread is started each time a new database should be processed.The database thread keeps track <strong>of</strong> <strong>the</strong> access to <strong>the</strong> filesystem database <strong>and</strong> mainly keeps inmemory <strong>the</strong> index <strong>of</strong> <strong>the</strong> database to be ready for search queries. The database thread lifespan is for <strong>the</strong> entire applications uptime <strong>and</strong> thus it is considered permanent thread.The search process thread is started each time a new search query is posed on <strong>the</strong> system.After processing <strong>the</strong> query, asking <strong>the</strong> database for similar content, <strong>and</strong> building <strong>the</strong> rankedlist <strong>of</strong> <strong>the</strong> results <strong>the</strong> thread terminates.Finally, <strong>the</strong> logger thread keeps track <strong>of</strong> all <strong>the</strong> actions that occur from <strong>the</strong> start to <strong>the</strong> end <strong>of</strong><strong>the</strong> application’s lifecycle <strong>and</strong> displays <strong>the</strong>m in a window <strong>of</strong> <strong>the</strong> UI.The Search <strong>and</strong> Retrieval Framework was developed as a multi-document interface (MDI)environment in order to support multiple functionalities in parallel. The startup UI is presented inFigure 2. At <strong>the</strong> top <strong>of</strong> <strong>the</strong> UI <strong>the</strong>re is a toolbar menu with all <strong>the</strong> available functionalities as well assome icon buttons for <strong>the</strong> most common features. The toolbar from left to right has a search menu, adatabase preparation menu, an options menu, a“window” tiling <strong>and</strong> navigation menu, <strong>and</strong> a “help”that navigates <strong>the</strong> user to <strong>the</strong> “about” window for information about <strong>the</strong> s<strong>of</strong>tware.4/03/2013 8


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 2: 3DVIVANT S&R framework starting GUIIn Figure 3 presents <strong>the</strong> exp<strong>and</strong>ed menu with all <strong>the</strong> available functionalities. Figure 3(a) presents <strong>the</strong>“Search” menu from where <strong>the</strong> user can start a new search process, open <strong>and</strong> examine <strong>the</strong> results <strong>of</strong> asearch process <strong>and</strong> save <strong>the</strong> results in a file. Figure 3(b) presents <strong>the</strong> two functionalities in preparing<strong>the</strong> database. The first is <strong>the</strong> descriptor extraction process which is discussed in details in section3.2<strong>and</strong> <strong>the</strong> creation or update <strong>of</strong> <strong>the</strong> index to enable fast search inside <strong>the</strong> database. In Figure 3(c) <strong>the</strong>options menu presents <strong>the</strong> two available options. The first is to open <strong>the</strong> settings UI <strong>and</strong> change <strong>the</strong>configuration <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware <strong>and</strong> <strong>the</strong> second option to open <strong>the</strong> terminal window where log messagesappear during each process. The “window” menu in Figure 3(d) enables <strong>the</strong> positioning <strong>and</strong>navigation through <strong>the</strong> available open windows. Finally, <strong>the</strong>“help” menu provides an “about” windowwith information on <strong>the</strong> s<strong>of</strong>tware <strong>and</strong> acknowledgement to 3DVIVANT project.(a)(b)4/03/2013 9


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools(c)(d)(e)Figure 3: The menu <strong>of</strong> Search & Retrieval s<strong>of</strong>twareFor <strong>the</strong> configuration <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware, a configuration file is available. The configuration file keepsfundamental information for <strong>the</strong> operation <strong>of</strong> <strong>the</strong> application <strong>and</strong> should be changed with care. Theconfiguration file can be accessed both from <strong>the</strong> filesystem <strong>and</strong> <strong>the</strong> application interface. ThevivantEngine.ini file is in <strong>the</strong> application folder.Figure 4 presents <strong>the</strong> settings window <strong>and</strong> <strong>the</strong>corresponding vivantEngine.ini file.Figure 1(a)(b)Figure 4: (a) settings window (b) vivantEngine.ini file containing <strong>the</strong> same info with settings window4/03/2013 10


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsThe configuration file contains <strong>the</strong> following information:Latest window positionWindow sizeDatabase pathJava pathBinary files pathSelected default descriptorBatchMode flagFrom <strong>the</strong> above settings <strong>the</strong> database path is <strong>the</strong> most critical one since it points to <strong>the</strong> database towhich all operations will be applied. Java path parameter is used if CEDD descriptors are selectedsince <strong>the</strong> implementation is provided as a jar file. The binary files path contain o<strong>the</strong>r helperapplications for <strong>the</strong> processing <strong>of</strong> <strong>the</strong> data. Finally <strong>the</strong> batch mode flag is used to process batch data<strong>and</strong> extract <strong>the</strong> XML files for <strong>the</strong> hyperlinker.4/03/2013 11


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools3 DATABASE PREPARATIONA fundamental step in <strong>the</strong> preparation <strong>of</strong> a multimedia database in order to be searchable is to extractdescriptor vectors that represent each multimedia object. In <strong>the</strong> case <strong>of</strong> 3DVIVANT S&R frameworkthis also st<strong>and</strong>s true. This section describes <strong>the</strong> database structure, <strong>the</strong> descriptors extraction process<strong>and</strong> <strong>the</strong> final indexing structure creation to enable multimedia similarity search. This process is alsoknown as <strong>of</strong>fline process as it is depicted in Figure 5.OfflineContentRepositoryPre-ProcessingLow-level FeatureExtractionDescriptorsBrowsing / Selectingexisting objectFeatureRetrievalFeatureMatchingResultsSubmitting newobjectPre-ProcessingLow-level FeatureExtractionOnlineFigure 5: Dynamic behaviour <strong>of</strong> <strong>the</strong> search <strong>and</strong> retrieval framework.The steps <strong>of</strong> <strong>the</strong> <strong>of</strong>fline process are discussed in <strong>the</strong> following subsections. Section 3.1 refers to <strong>the</strong>database structure, section 3.2 describes <strong>the</strong> descriptors extraction process <strong>and</strong> <strong>the</strong> user’s options,section 3.3 discusses <strong>the</strong> codebook generation process <strong>and</strong> <strong>the</strong> extracted histograms (bag <strong>of</strong> words)<strong>and</strong> finally section 3.5 describes <strong>the</strong> creation <strong>of</strong> indexes for fast similarity search inside <strong>the</strong> database.3.1 DATABASE STRUCTUREIn 3DVIVANT <strong>the</strong> database is actually a filesystem structure that contains all <strong>the</strong> necessary data <strong>and</strong>metadata. The structure aims to organize <strong>the</strong> data in a) <strong>the</strong> raw multimedia files b) <strong>the</strong> features files c)<strong>the</strong> bag-<strong>of</strong>-words files d) <strong>and</strong> finally <strong>the</strong> index files. The filesystem structure is presented in Figure 6.In <strong>the</strong> “data” directory <strong>of</strong> <strong>the</strong> filesystem, <strong>the</strong> raw data are separated based on <strong>the</strong> modality in order todrive <strong>the</strong> appropriate descriptor extraction algorithms properly for each modality. Moreover, <strong>the</strong>proposed structure enables easier reference <strong>of</strong> <strong>the</strong> multimedia objects in <strong>the</strong> retrieval process.The decision to select <strong>the</strong> S&R database to be a filesystem structure is supported by variousarguments. First, a filesystem database is very easy to create even for a novice user. The possibility4/03/2013 12


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Toolsfor a user to have direct access to <strong>the</strong> actual content provides many advantages in a process <strong>of</strong> a videoediting scenario such as <strong>the</strong> “hyperlinker scenario”. Moreover, <strong>the</strong> S&R framework aims to be usedalso by novice/home users that need to search inside <strong>the</strong>ir multimedia content, that most <strong>of</strong> <strong>the</strong> timeisstored inside a folder on <strong>the</strong>ir hard disk drive.Figure 6: Database filesystem structure3.2 DESCRIPTORS EXTRACTIONThe selected descriptor extraction algorithms run through <strong>the</strong> “data” directory in <strong>the</strong> appropriatemodality (viewpoint images, depth maps, curvature images, 2D snapshots etc.) <strong>and</strong> extract onefeatures’ file for each multimedia file. For each supported descriptor <strong>the</strong>re is a directory with <strong>the</strong> same4/03/2013 13


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Toolsname under <strong>the</strong> “descr” folder. Moreover, under each descriptor folder <strong>the</strong>re are two subfolders (seeFigure 6): one for <strong>the</strong> extracted features (“features” folder) <strong>and</strong> one for <strong>the</strong> bag-<strong>of</strong>-words vector(“words” folder) that will be generated for each feature file, using a generated codebook (see section3.3).The Search <strong>and</strong> Retrieval Framework supports <strong>the</strong> following descriptor extraction algorithms for <strong>the</strong>extraction <strong>of</strong> local features <strong>of</strong> <strong>the</strong> multimedia objects:1. CSIFT: Colour SIFT2. OSIFT: Opponent SIFT. A Colour SIFT variant where a different colour model is used.3. GB: Geometric Blur4. SSIM: Self Similarity5. PI: Projection Images. Applicable only in depth dataThe User Interface for extracting descriptors (see Figure 7) enables <strong>the</strong> user to select whe<strong>the</strong>r <strong>the</strong>system will generate descriptors for one descriptor algorithm or for all <strong>of</strong> <strong>the</strong>m. Based on <strong>the</strong> selection<strong>of</strong> <strong>the</strong> user, <strong>the</strong> system triggers one or multiple threads that extract descriptors for <strong>the</strong> same raw data(under “data” directory) in parallel. From this step until <strong>the</strong> finish <strong>of</strong> <strong>the</strong> final descriptors <strong>the</strong> systemworks without <strong>the</strong> need for user intervention. The extraction <strong>of</strong> <strong>the</strong> local descriptors is a lengthyprocess <strong>and</strong> thus a progress bar informs <strong>the</strong> user for <strong>the</strong> progress <strong>of</strong> <strong>the</strong> computations (Figure 8).Moreover, <strong>the</strong> console window prints log messages to give more information on <strong>the</strong> runningprocesses.Figure 7: Descriptors Extraction User Interface4/03/2013 14


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 8: Descriptor extraction is a lengthy process. The console prints logs on <strong>the</strong> progress <strong>of</strong> <strong>the</strong> processAs it was stated above, each descriptor extraction algorithm generates its own local feature vectors<strong>and</strong> stores <strong>the</strong>m under <strong>the</strong> “descr” directory in <strong>the</strong> corresponding name. To unify <strong>the</strong> process <strong>of</strong> datastorage <strong>and</strong> ease <strong>the</strong> Input /Output processes we chose to have a unique format for all <strong>the</strong> availablelocal features files. The format include <strong>the</strong> algorithm name, <strong>the</strong> dimensions <strong>of</strong> <strong>the</strong> feature vector, <strong>the</strong>number <strong>of</strong> <strong>the</strong> local features in <strong>the</strong> file <strong>and</strong> <strong>the</strong>n one vector per line along with <strong>the</strong> positioninginformation for each local feature vector. The file format is presented in Figure 9.Figure 9: file format for <strong>the</strong> local features fileWhen <strong>the</strong> descriptor extraction processes finish, <strong>the</strong> bag-<strong>of</strong>-words histogram should be computed. Tocalculate this final histogram a codebook is needed. Section 3.3 discusses this process.4/03/2013 15


ICTProjectContract no.:2484203D VIVANT–<strong>Deliverable</strong><strong>5.2</strong>Search& Retrieval Mechanismss &Tools3.3CODEBOOK GENERATION AND BAG-OF-WORDSSince most <strong>of</strong> <strong>the</strong> well performing descriptor extraction algorithms extract local features, <strong>the</strong>re aremany feature vectors that correspondtoa single multimedia object. The local features for eachmultimedia object may be hundreds or thous<strong>and</strong>s <strong>and</strong>thus it is very difficult to use in an efficient wayfor <strong>the</strong> matchingstep. A common approach to this problem is <strong>the</strong>bag-<strong>of</strong>-words (BoW) method. With<strong>the</strong> BoW, a codebook with <strong>the</strong> most dominant code words (quantized descriptor vectors) is generated.The codebook isgenerated using a k-means algorithmwith a typical k <strong>of</strong> 5000 to 1000 dimensions <strong>and</strong>by sampling <strong>the</strong> local features available in <strong>the</strong> database.Theneach set <strong>of</strong> local features is checked against <strong>the</strong> codebook to generate a histogram <strong>of</strong>occurrences <strong>of</strong> code words for each features’ file (which correspond to one multimedia object). Thefinalhistogram <strong>of</strong> <strong>the</strong> BoWis actually a new descriptor that represents <strong>the</strong>multimediaobject withonlyone descriptor vector. These final BoW descriptor vectors are stored in<strong>the</strong> database under <strong>the</strong>“words” directories.3.4MANIFOLD LEARNING AND UNIFIED SEARCH SPACEThe next step after <strong>the</strong> generation <strong>of</strong> <strong>the</strong> bag-<strong>of</strong>-words is <strong>the</strong> creation <strong>of</strong> <strong>the</strong> unified search space as itwas described in details in<strong>Deliverable</strong> D5.1. Forthis <strong>the</strong> manifold learning process starts <strong>and</strong>combines <strong>the</strong> different modalities (viewpoints, depth, curvature) to build a unified vector space.Thethatfinal outcome <strong>of</strong> <strong>the</strong> process is a single descriptor vector (see Figure 10) per multimedia objectcorrespondss to <strong>the</strong> unified multimodal search space.Figure 10: <strong>the</strong> manifold learning process produces a single multimodal descriptor per objectThisfinal singledescriptor vector per multimedia object is <strong>the</strong>nindexed to prepare <strong>the</strong> database forsimilarity searchqueries.4/03/201316


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools3.5 INDEX CREATIONIt is very common in a search environment to use indexing structures to enable faster similaritysearch. For <strong>the</strong> 3DVIVANT search <strong>and</strong> retrieval framework <strong>the</strong> kd-tree indexing structure (see Figure11) was selected. The kd-tree builds a tree data structure (or a number <strong>of</strong> trees – kd-tree forest) where<strong>the</strong> most similar descriptor vectors are close toge<strong>the</strong>r. By using such a structure, <strong>the</strong> exhaustive linearsearch is avoided <strong>and</strong> thus <strong>the</strong> framework answers faster in similarity search queries.List <strong>of</strong>Space partitiona: [v a,1v a,2...v a,n]b: [v b,1v b,2...v b,n]⁞k-D12 34 5 6a b c d e f gFigure 11: kd-tree indexing structureIn <strong>the</strong> cases that <strong>the</strong> database is small or <strong>the</strong>re are no external-to-<strong>the</strong>-database queries <strong>the</strong> kd-treeindexer may not be used. In that case <strong>the</strong> engine can pre-compute <strong>the</strong> search results for each record in<strong>the</strong> database <strong>and</strong> store <strong>the</strong>se results for future queries.4/03/2013 17


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools4 SEARCH FOR SIMILAR CONTENTIn <strong>the</strong>online phase <strong>of</strong> <strong>the</strong> framework, <strong>the</strong> database should be already prepared <strong>and</strong> waiting for queries.The framework supports two kinds <strong>of</strong> querying methods. The first is <strong>the</strong> typical method <strong>of</strong> a userposing a query multimedia object (a single query – see section 4.1) <strong>and</strong> <strong>the</strong> system replies with <strong>the</strong>results list. The second method is a “batch search” method where a user may need to pre-compute <strong>the</strong>results for a set <strong>of</strong> queries to use <strong>the</strong>m in <strong>the</strong> future or outside <strong>the</strong> framework (section 4.2). Thissecond approach is used in <strong>the</strong> hyperlinker demo process. This will be presented in detail in<strong>Deliverable</strong> D5.44.1 SINGLE FRAME SEARCHIn <strong>the</strong> single frame search functionality <strong>the</strong> user selects to perform a new query from <strong>the</strong> menu or <strong>the</strong>toolbar icon. A new search thread is generated <strong>and</strong> a new window is presented in <strong>the</strong> UI (Figure 12).The window provides an “open” button that opens <strong>the</strong> platform’s native “file open” dialog. The userselects <strong>the</strong> integral frame file that he wants to use as query <strong>and</strong> loads it into <strong>the</strong> window. The “loadmask” button loads <strong>the</strong> corresponding segmentation mask from <strong>the</strong> database. The segmentation maskloading opens <strong>the</strong> mask image file <strong>and</strong> reads from <strong>the</strong> corresponding XML file or computes on <strong>the</strong> fly<strong>the</strong> bounding boxes <strong>and</strong> clickable areas for each object in <strong>the</strong> frame. After this process <strong>the</strong> user is ableto click <strong>the</strong> exact object (see Figure 13) in <strong>the</strong> loaded integral image <strong>and</strong> use it as a query. In this case<strong>the</strong> query is only <strong>the</strong> selected object <strong>and</strong> not <strong>the</strong> entire query image. If <strong>the</strong> user selects to not load <strong>the</strong>mask <strong>and</strong> just click <strong>the</strong> search button, <strong>the</strong> whole integral image is used as a query image.Figure 12: Query formulator window4/03/2013 18


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 13: segmentation mask is loaded <strong>and</strong> <strong>the</strong> user may click <strong>the</strong> object to use as queryAfter <strong>the</strong> query is submitted with ei<strong>the</strong>r <strong>of</strong> <strong>the</strong> two methods, <strong>the</strong> search process is started. Thedescriptor extractor process is started <strong>and</strong> <strong>the</strong> descriptors are generated. Then <strong>the</strong> manifold learningalgorithm is called to project <strong>the</strong> set <strong>of</strong> selected descriptors to <strong>the</strong> multimodal space. This finaldescriptor vector is used to query <strong>the</strong> index for similar content.The final results <strong>of</strong> <strong>the</strong> process appear in a new window (see Figure 14) that displays a list <strong>of</strong> objectsfrom <strong>the</strong> most to <strong>the</strong> least similar one, using both a central viewpoint image <strong>and</strong> <strong>the</strong> original modalityobject (e.g. integral image or 3D model etc.).4/03/2013 19


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 14: Results window with <strong>the</strong> retrieved similar objects.In <strong>the</strong> case that one <strong>of</strong> <strong>the</strong> retrieved results is a 3D model, <strong>the</strong> user can click <strong>and</strong> select <strong>the</strong> object for adetailed examination. In this case a new window pops up <strong>and</strong> <strong>the</strong> 3D model is loaded in a 3Drendering environment. The environment provides easy 3D model manipulation using only mousegestures. Figure 15 depicts this 3D rendering window with <strong>the</strong> 3D manipulator UI appear in red lineon top <strong>of</strong> <strong>the</strong> 3D model.Figure 15: Examining <strong>the</strong> 3D model from <strong>the</strong> relevant results4/03/2013 20


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools4.2 BATCH SEARCH TARGETING HYPERLINKER ENVIRONMENTThe batch search method is targeting <strong>the</strong> hyperlinker environment. The goal <strong>of</strong> this feature is to preprocessan integral video sequence (<strong>the</strong> raw data database) <strong>and</strong> generate <strong>the</strong> whole set <strong>of</strong> ranked listsfor <strong>the</strong> available clickable objects in each frame. The output <strong>of</strong> <strong>the</strong> process is one XML file for eachintegral video frame. The XML file along with <strong>the</strong> corresponding video frame, are used as input to <strong>the</strong>Hyperlinking environment tool, to enable <strong>the</strong> tool to present search results for <strong>the</strong> clickable objects.These results are presented to <strong>the</strong> user / editor to select possible hyperlinks <strong>of</strong> <strong>the</strong> object to ano<strong>the</strong>rmultimedia object or just a web URI.In <strong>the</strong> batch search approach <strong>the</strong> user should feed <strong>the</strong> engine with all <strong>the</strong> necessary raw data (integralimages, central viewpoints etc.) <strong>and</strong> segmentation masks <strong>and</strong> <strong>the</strong> appropriate XML files providedwith <strong>the</strong> segmented masks for each frame <strong>of</strong> a video sequence.The batch process is triggered by setting batchMode=yes in <strong>the</strong> configuration file <strong>and</strong> <strong>the</strong>n clickingbuild index button. By selecting to process <strong>the</strong> sequence in batch mode <strong>the</strong> engine generates an all-toallsearch matrix. The engine uses all <strong>the</strong> database records, one at a time, as a query object to findsimilar objects inside <strong>the</strong> database. For each one <strong>of</strong> <strong>the</strong>m a ranked list is generated that references <strong>the</strong>similar objects <strong>of</strong> <strong>the</strong> database.The difference in this mode is that <strong>the</strong> user does not need to select <strong>and</strong> perform search for all <strong>the</strong>integral frames by h<strong>and</strong>. The automated process enables faster <strong>and</strong> less error prone results.Since <strong>the</strong> batch mode is used in <strong>the</strong> hyperlinker demo workflow <strong>the</strong> appropriate XML file for eachranked list is generated to be used as input to <strong>the</strong> hyperlinker environment. The XML files containinformation for <strong>the</strong> bounding boxes <strong>of</strong> <strong>the</strong> segmented objects for each frame <strong>and</strong> <strong>the</strong> search results <strong>of</strong>each bounding box if available. The schema serves as a general metadata exchange schema between<strong>the</strong> hyperlinker environment <strong>and</strong> peripheral tools so <strong>the</strong> search results are optional for each boundingbox. Figure 16 presents <strong>the</strong> XML schema <strong>of</strong> <strong>the</strong> generated files.4/03/2013 21


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsFigure 16: XML schema for <strong>the</strong> output data, targeting <strong>the</strong> hyperlinker environment4/03/2013 22


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools5 IMPLEMENTATION DETAILS AND TOOLSThe search <strong>and</strong> retrieval framework integrates various tools <strong>and</strong> technologies in order to fulfil <strong>the</strong>required functionalities.The S<strong>of</strong>tware is implemented in C++ using <strong>the</strong> Qt 4.7.3 framework [3], <strong>the</strong> opencv (Open ComputerVision) library[4] <strong>and</strong> GLC library for 3D rendering [5]. The s<strong>of</strong>tware was compiled with MS VisualStudio 2008 compiler <strong>and</strong> tested on Windows XP <strong>and</strong> Windows 7 platforms.Qt frameworkQt is a cross platform application <strong>and</strong> UI framework that provides a broad set <strong>of</strong> tools for fast <strong>and</strong>efficient coding. The most important libraries are:Qt Core, which contains all <strong>the</strong> necessary classes for a core application. It is <strong>the</strong> foundation<strong>of</strong> all Qt-based applications. The key functions are: File I/O, objects h<strong>and</strong>ling, multithreading<strong>and</strong> concurrency, plugins <strong>and</strong> settings management <strong>and</strong> finally signals <strong>and</strong> slots interobjectcommunications.Qt GUI module contains <strong>the</strong> functionality needed to develop advanced UI applications. Animportant feature <strong>of</strong> <strong>the</strong> module is that it uses <strong>the</strong> native graphics API <strong>of</strong> each platform itsupports, taking full advantage <strong>of</strong> <strong>the</strong> system resources.Except from <strong>the</strong>se basic library modules, Qt includes also <strong>the</strong> following libraries:1. 2D graphics Canvas2. OpenGL, which was used with GLC library3. WebKit, which was used for <strong>the</strong> rendering <strong>of</strong> <strong>the</strong> hyperlinked frames from <strong>the</strong> integralsequences4. Scripting5. Multimedia6. Networking7. XML8. Database9. Unit Testing10. Declarative UIs for touch enables <strong>and</strong> embedded devicesOpenCV libraryOpenCV is a well-known, open source, Computer Vision library with a large community <strong>of</strong>developers <strong>and</strong> users. OpenCV provides a set <strong>of</strong> image I/O, image processing, computer vision <strong>and</strong>machine intelligence algorithms that are well documented <strong>and</strong> thoroughly tested from <strong>the</strong> users’community.GLC libraryGLC library provides input/output functionality for most <strong>of</strong> <strong>the</strong> well-known 3D data formats such as:4/03/2013 23


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &ToolsCollada V1.4, 3DXML ASCII V3 <strong>and</strong> V4, OBJ, 3DS, STL (ASCII <strong>and</strong> binary), OFF <strong>and</strong> COFF.Moreover, GLC provides easy mechanisms for OpenGL based 3D rendering inside a Qt windowenvironment. A very useful feature <strong>of</strong> <strong>the</strong> library is <strong>the</strong> build-in 3D view manipulator that gives anintuitive UI for <strong>the</strong> user.4/03/2013 24


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools6 CONCLUSIONSThis document presents <strong>the</strong> 3D VIVANT Search <strong>and</strong> Retrieval framework. It presents <strong>the</strong> actuals<strong>of</strong>tware <strong>and</strong> <strong>the</strong> User Interfaces that incorporate <strong>the</strong> search <strong>and</strong> retrieval algorithms described indeliverable D5.1.The deliverable begins with a report on <strong>the</strong> overall User Interfaces <strong>and</strong> <strong>the</strong> menufunctionalities. Then <strong>the</strong> following sections describe <strong>the</strong> <strong>of</strong>fline <strong>and</strong> online processes <strong>of</strong> <strong>the</strong>multimedia similarity search concept <strong>and</strong> present in detail parts <strong>of</strong> <strong>the</strong> UI for each function. Thedeliverable describes in depth <strong>the</strong> database structure that <strong>the</strong> S&R engine uses <strong>and</strong> <strong>the</strong> file formats forall <strong>the</strong> types <strong>of</strong> I/O files used. Finally, section 5 discusses <strong>the</strong> s<strong>of</strong>tware tools <strong>and</strong> implementationdetails.The 3DVIVANT search <strong>and</strong> retrieval framework provides <strong>the</strong> tools <strong>and</strong> functionality for efficientlycreate a multimodal multimedia database <strong>and</strong> perform multimodal queries. The multimodal approachenables <strong>the</strong> search for similar content using any available modality e.g. search for 3Dmodels but useintegral images as queries.4/03/2013 25


ICT Project 3D VIVANT– <strong>Deliverable</strong> <strong>5.2</strong>Contract no.:248420Search & Retrieval Mechanisms &Tools7 BIBLIOGRAPHY[1]. 3DVivant deliverable D5.1[2]. 3DVivant deliverable D5.4[3]. http://qt.nokia.com/[4]. http://opencv.willowgarage.com/wiki/[5]. http://www.glc-lib.net/4/03/2013 26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!