10.07.2015 Views

Addressing database mismatch in forensic speaker ... - ATVS

Addressing database mismatch in forensic speaker ... - ATVS

Addressing database mismatch in forensic speaker ... - ATVS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

dencies or wire-tapped. This is exactly the same case asfor a recovered wire-tap.¤ Speech recorded us<strong>in</strong>g a microphone. Although therecord<strong>in</strong>g conditions are much more controlled than <strong>in</strong>the hidden microphone case for questioned record<strong>in</strong>gs,variability <strong>in</strong> the room layout, environmental conditionsand microphone types makes extremely difficult to predictwhich k<strong>in</strong>d of speech will be needed for a representativebackground <strong>database</strong>.Although <strong>database</strong> <strong>mismatch</strong> is commonly seen as a problem<strong>in</strong> the community and among <strong>forensic</strong> experts, to theauthors’ knowledge, rigorous experimental studies regard<strong>in</strong>gthis important issue <strong>in</strong> <strong>forensic</strong> <strong>speaker</strong> recognition have beenma<strong>in</strong>ly focused on telephonic <strong>database</strong>s. With the release ofNIST multi-microphone <strong>database</strong>s <strong>in</strong> SRE 2005 and 2006 [1],a richer dataset has been available for research <strong>in</strong> the topic. Infact, realistic <strong>database</strong>s for <strong>forensic</strong> <strong>speaker</strong> recognition are importantfor two ma<strong>in</strong> reasons: first, more data will be availablefor research <strong>in</strong> <strong>database</strong> <strong>mismatch</strong>; and second, <strong>forensic</strong> laboratorieswill be able to improve the robustness of their <strong>speaker</strong>recognition systems <strong>in</strong> casework conditions.Driven by these needs, this paper presents the Ahumada III<strong>database</strong>, a real-casework publicly available corpus <strong>in</strong> Spanish,which has been acquired by the Acoustics and Image Process<strong>in</strong>gDepartment of Spanish Guardia Civil. In its current release,Ahumada III Release 1 (Ah3R1), it <strong>in</strong>cludes speech data fromreal <strong>forensic</strong> cases recovered us<strong>in</strong>g one of the typical record<strong>in</strong>gsystems from Guardia Civil, namely analog magnetic tapesconta<strong>in</strong><strong>in</strong>g GSM tapp<strong>in</strong>gs. Moreover, the <strong>database</strong> is be<strong>in</strong>g extendedwith more material under this platform and also us<strong>in</strong>gSITEL, a Spanish nationwide digital tapp<strong>in</strong>g system. Ah3R1<strong>in</strong>cludes variability <strong>in</strong> conditions such as noise, environmentalcharacteristics, emotional state, country and region of orig<strong>in</strong>and dialect of <strong>speaker</strong>s, etc. Next releases will significantly <strong>in</strong>creasethe amount of data present<strong>in</strong>g strong variability from realcases.This work also explores the <strong>database</strong> <strong>mismatch</strong> problem us<strong>in</strong>gthe presented Ahumada III <strong>database</strong> and a corpus generatedfrom multi-microphone data from NIST SRE 2006, namely theNIST4M <strong>database</strong> (NIST MultiMic MisMatch). This paper isorganized as follows. First, the Ahumada III <strong>database</strong> is described<strong>in</strong> the context of Guardia Civil operative procedures for<strong>forensic</strong> <strong>speaker</strong> recognition. The experimental section thenpresents results which illustrate the impact of <strong>database</strong> <strong>mismatch</strong><strong>in</strong> system performance <strong>in</strong> two different ways: <strong>database</strong><strong>mismatch</strong> us<strong>in</strong>g NIST multi-microphone corpora, and robustness<strong>in</strong> real-casework conditions us<strong>in</strong>g Ahumada III. Resultsshow the importance of the <strong>database</strong> <strong>mismatch</strong> problem andencourages research <strong>in</strong> the field and data collection. F<strong>in</strong>ally,conclusions are drawn.2. <strong>Address<strong>in</strong>g</strong> <strong>database</strong> <strong>mismatch</strong> <strong>in</strong> realcasework: the Ahumada III <strong>database</strong>In the last years, Spanish Guardia Civil has done a significanteffort on the application of automatic <strong>speaker</strong> recognition systemsto <strong>forensic</strong> voice evidences [2]. Follow<strong>in</strong>g a Bayesianframework, and pursu<strong>in</strong>g transparency <strong>in</strong> their reports, muchof this effort has concentrated <strong>in</strong> the assessment of their system<strong>in</strong> the sake of testability, as a demand<strong>in</strong>g requirement <strong>in</strong>the new paradigm of <strong>forensic</strong> identification sciences [6]. Mostvoice evidences arriv<strong>in</strong>g to Guardia Civil laboratories have twopossible orig<strong>in</strong>s. First, digitized analog magnetic record<strong>in</strong>gsfrom GSM mobile calls are typical <strong>in</strong> cases between 1995 and2004. From those record<strong>in</strong>gs of this type received <strong>in</strong> the lastten years, those authorized (case by case) by the correspond<strong>in</strong>gjudge after a trial, have been added to a <strong>database</strong> registered <strong>in</strong>the Spanish M<strong>in</strong>isterio del Interior, known as Base de Datos deRegistros Acústicos (BDRA) 1 . Second, nationwide digital <strong>in</strong>terceptionsystem (SITEL) has been used s<strong>in</strong>ce 2005 by the twoSpanish State Police forces. This system records digital wiretapsdirectly connected to all mobile telephone operators.With the purpose of robustness <strong>in</strong> real-case conditions andproper assessment of systems, Guardia Civil has recorded severalspeech <strong>database</strong>s <strong>in</strong> the last decade. Back <strong>in</strong> 1998, the AhumadaI corpus was collected [7] conta<strong>in</strong><strong>in</strong>g 100 male <strong>speaker</strong>s<strong>in</strong> telephone and microphone record<strong>in</strong>gs. A subcorpus ofAhumada I spontaneous telephone speech was used <strong>in</strong> NISTSpeaker Recognition Evaluations both <strong>in</strong> 2000 and 2001.complementary <strong>database</strong> known as Gaudi with 100 female<strong>speaker</strong>s was recorded <strong>in</strong> 2001 2 . From 2004 to 2006, the Baeza<strong>database</strong> was recorded, <strong>in</strong>clud<strong>in</strong>g GSM and microphone spontaneousconversational speech sessions recorded at the sametime, from ¡¢£ males and ¡¢ females. Baeza has shown, upto this moment, the most relevant reference population to beused <strong>in</strong> real cases, as it was recorded through the same SpanishGSM network as <strong>in</strong> actual cases. In 2008, all 100 male <strong>speaker</strong>sfrom Ahumada I are still available and are to be aga<strong>in</strong> recordedthrough GSM and SITEL. In this way, Ahumada II (as is to beknown) will constitute a major contribution for the analysis oflong-term stability and degradation of <strong>speaker</strong> features after tenyears. As most new cases come through SITEL record<strong>in</strong>gs andare to be, by this time, evaluated with Baeza as reference population,also <strong>in</strong> 2008 almost 100 <strong>speaker</strong>s from Baeza (differentfrom Ahumada <strong>speaker</strong>s) are to be recorded aga<strong>in</strong> through SI-TEL. This <strong>database</strong> will be known as Ahumada IV and will beused for system assessment.2.1. The Ahumada III <strong>database</strong>Ahumada III consists of authorized conversational speech fromreal cases both from BDRA and SITEL. The expected size ofthe <strong>database</strong> <strong>in</strong> number of <strong>speaker</strong>s and variety of conditionsaddressed is huge both <strong>in</strong> terms of number of available calls andamount of data. However, as conditions are not uniform, andspeech record<strong>in</strong>gs have to be authorized one by one, differentreleases of the <strong>database</strong> will be progressively available.Ahumada III Release 1 (Ah3R1) consists ¢£ of <strong>speaker</strong>sfrom a number of real cases with GSM BDRA calls acrossSpa<strong>in</strong>, with a variety of country of orig<strong>in</strong> of <strong>speaker</strong>s, emotionaland acoustic conditions, and dialects <strong>in</strong> the case of Spanishspeech.The unique low variability dimension is gender,as all of them are male <strong>speaker</strong>s. All ¢£ <strong>speaker</strong>s <strong>in</strong> Ah3R1have two m<strong>in</strong>utes of speech available from a s<strong>in</strong>gle phone callto be used as unquestioned (control) record<strong>in</strong>g, with the purposeof model tra<strong>in</strong><strong>in</strong>g or voice characterization. Additionally,ten speech segments for £ <strong>speaker</strong>s and five segments for £<strong>speaker</strong>s are <strong>in</strong>cluded for test<strong>in</strong>g issues, each one from a differentcall. Such fragments present between ¡ and ¢¤ secondsof speech, with an average duration of £ seconds. An evaluationprotocol, equivalent to that of NIST SRE [1], is availableand ready to be used. In this way, experienced NIST SRE par-1 With reference public scientific file number 1981420003 fromSpanish Guardia Civil, Orden M<strong>in</strong>isterial INT/3764/2004 de 11 denoviembre.2 A 25 Ahumada male subcorpus and an equivalent 25 Gaudi femalesubcorpus are freely available <strong>in</strong> http://atvs.ii.uam.es/<strong>database</strong>s.jsp.A

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!