11.07.2015 Views

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

acknowledgements for ansi/nist-itl 1-2011 - NIST Visual Image ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ANSI/<strong>NIST</strong>-ITL 1-<strong>2011</strong> - UPDATE 2013 DRAFT VERSION[2013v>]8.11 Record Type-11: Forensic and investigatory voice recordThe Type -11 is focused upon the analysis of voice signals <strong>for</strong> <strong>for</strong>ensic and investigaorypurposes. Voice analysis is often divided into two general areas:· Speech Recognition· Speaker RecognitionBoth of these areas may play a part in <strong>for</strong>ensic and investigatory analyses. Speechrecognition involves the interpretation of vocalizations <strong>for</strong> their linguisticcontent. Speaker recognition involves determining who is per<strong>for</strong>ming the vocalizations.The human voice - generally carrying both speech and non-speech sounds - propagatesvarying distances through air (principally) or another medium to reach acoustictransducers (usually microphones, when recorded) of varying amplitude and phaseresponse. For purposes of the Type-11 record, a “speaker” is any person producing“vocalizations” from the throat or oral cavity, which may be voiced (activating thevocal cords) or unvoiced (such as aspirations, whispers, tongue clicks and other similarsounds). An automated interlocutor is considered to be a “speaker” <strong>for</strong> the purposes ofthis record type, since the intent is to directly mimic human speech, although such aspeaker will not be the primary subject of an ANSI/<strong>NIST</strong>-ITL transaction.When voice sounds carry speech, that speech usually occurs within a social contextinvolving more than one speaker. Consequently, a recorded speech signal may containthe voices of multiple speakers. Thus, the Type-11 record accommodates recordingswith multiple speakers; can designate whether any of the speakers are alreadyidentified; can convey the count the number of individual speakers; and can conveywhen the same person is speaking at multiple points during the recording. It can alsoconvey the transcribed linguistic content of each speaker, if it can be deciphered.An ANSI/<strong>NIST</strong>-ITL transaction is typically focused upon the identification of oneindividual. However, in order to effectively per<strong>for</strong>m that identification (or verificationof identity), it may be necessary to include in<strong>for</strong>mation about other persons in thetransaction. With voice recordings, it may be necessary to contain in a transaction‘known’ clips of certain persons who are possibly speaking in the recording underinvestigation, in order to separate out the speech of the known individuals andconcentrate on the identification of the remaining speakers. Thus, there may be adifference between the ‘subject of the transaction’ and the ‘subject of the record.’Multiple Type-11 records may be contained in a single transaction. The type of actiondesired by the submitter of the transaction (to be per<strong>for</strong>med by the receiver of thetransaction) is specified in a Type-1 record in the TOT field.There are factors that had to be considered in developing this record type. Some of themost significant ones include:· Voice signals generally contain both speech and non-speech elements, either ofwhich might be useful in speaker recognition applications.· Unlike other modalities, voice signals are collected in time - not spatial -230

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!