Lecture 12_winter_2012.pdf

Digital Speech Processing 

Processing— 

Lecture 12 

Homomorphic 

p 

Speech Processing 

1

General Discrete Discrete-Time Time Model of 

S Speech h Production 

P d i 

pL[ n] = p[ n] ∗hV[ n] 

−− Voiced Speech 

hV[ n] = AV ⋅g[ n] ∗v[ n] ∗r[ 

n] 

p L [ n ] = u [ n ] ∗ h U [ n ] −− Unvoiced Speech 

h [ n] = A ⋅v[ n] ∗r[ 

n] 

U N 

2

Basic Speech Model 

• short segment of speech can be modeled as 

having been generated by exciting an LTI system 

either by a quasi-periodic impulse train, or a 

random noise signal 

• speech analysis => estimate parameters of the 

speech model, measure their variations (and 

perhaps even their statistical variabilites variabilites-for for 

quantization) with time 

• speech = excitation * system response 

=> want to deconvolve speech into excitation and 

system response => > do this using homomorphic 

filtering methods 

3

Superposition Superposition Principle 

Principle 

+ + 

L{ 

} 

x[ n] 

y[ n] 

= L{ 

x[ 

n]} 

x n] 

+ x [ n] 

L { x n] 

} + L { x [ n] 

} 

1[ 

2 

xn [ ] = ax[ n] + bx[ n] 

1 2 

1[ 

2 

yn [ ] = L { xn [ ]} = a L { x [ n ]} + b L { x [ n ]} 

1 2 

4

• 

Generalized Superposition for Convolution 

* 

{ } 

xn [ ] 

H 

y [ n ] = H { x [ n ] } 

x n] 

∗ x [ n] 

H { x n] 

} ∗H 

{ x [ n] 

} 

1[ 

2 

for LTI systems we have the result 

y [ n ] = x [ n ] ∗ h [ n ] = ∑ x [ k ] h [ n−k ] 

∞ 

k =−∞ 

* 

1[ 

2 

• "generalized" superposition => addition replaced by convolution 

xn [ ] = x x1 [ n ] ∗ x x2 [ n ] 

yn [ ] = H { x[ n]} = H { x1[ n]} ∗H 

{ x2[ 

n]} 

• homomorphic system ffor 

or convolution 

5

Homomorphic Homomorphic Filter 

Filter 

• homomorphic filter => homomorphic system ⎡ 

⎣ 

⎤ 

⎦ that H 

homomorphic filter homomorphic system ⎡ 

⎣ 

⎤ 

⎦ that 

passes the desired signal unaltered, while removing the 

undesired signal 

H 

g 

x( n) = x [ n] ∗x 

[ n] - with x [ n] 

the "undesired" signal 

• 

1 2 1 

xn = x 1 n ∗ 2 

H { [ ]} H { [ ]} 

H 

H 

H { x [ n]} 

{ x [ n]} → δ ( n) - removal of x [ n] 

1 1 

{ x [ n]} → x [ n] 

2 2 

H { xn [ ]} = δ [ n] ∗ x2[ n] = x2[ n] 

for linear systems this is analogous to additive noise removal 

6

Canonic Form for Homomorphic 

Deconvolution 

* + + + + 

D { } L { } 

−1 

D { } 

∗ 

x[ n] xn ˆ[ ] 

yn ˆ[ ] 

yn [ ] 

x n x n 

ˆ ˆ 

ˆ ˆ 

1 [ ] ∗ 2 [ ] x 1 [ n ] + x 2 [ n ] y 1 [ n ] + y 2 [ n ] 

1 ∗ 2 

• 

any homomorphic system can be represented as a cascade 

of three systems, systems e.g., e g for convolution 

∗ 

* 

y [ n ] y [ n ] 

1. system takes inputs combined by convolution and transforms 

them into additive outputs p 

2. system is a conventional linear system 

3. inverse of first system--takes additive inputs and transforms 

th them iinto t convolutional l ti l outputs 

t t 

7

Canonic Form for Homomorphic Convolution 

* 

+ + + + 

D { } { } 

−1{ 

} 

x[ n] 

∗ 

L 

D 

[ ] xn ˆ[ [ ] y ˆ[ [ n ] ∗ y [ n ] 

x n x n 

ˆ ˆ 

ˆ ˆ 

1[ ] ∗ 2[ 

] x1[ n] + x2[ n] 

y1[ n] + y2[ n] 

1 ∗ 2 

xn [ ] = x1[ n] ∗x2[ 

n] 

- convolutional relation 

xn ˆ[ ] = D { [ ]} ˆ ˆ 

∗ xn = x1[ n] + x2[ n] 

- additive relation 

yn ˆ[ ] = L{ 

xˆ [ n] + xˆ [ n]} = yˆ [ n] + yˆ [ n] 

- conventional linear system 

1 2 1 2 

−1 

∗ ˆ1 ˆ2 

1 2 

* 

y[ n] y [ n] 

y[ n] = D { y [ n] + y [ n]} = y [ n] ∗y 

[ n] - inverse of convolutional relation 

=> design converted back to linear system, L 

D∗ 

⎡⎣ ⎤⎦ 

- fixed (called the characteristic system for homomorphic deconvolution) 

−1 

D∗ ⎡⎣ ⎤⎦ 

- fixed (characteristic system for inverse homomorphic deconvolution) 

8

Properties of Characteristic 

Systems 

xn ˆ[ ] = D { xn [ ]} = D { x[ n] ∗x[ 

n]} 

∗ ∗ 

1 2 

= D ∗ { x 1 [ n ]} + D ∗ { x 2 [ n ]} 

= xˆ [ n] + xˆ [ n] 

1 2 

D { yn ˆ[ ]} = D { yˆ [ n] + yˆ [ n]} 

−1 −1 

∗ ∗ 

1 2 

= D { yˆ [ n]} ∗D 

{ yˆ [ n]} 

−1 −1 

∗ 1 ∗ 2 

= y [ n ] ∗ y [ n ] = y 

[ n ] 

1 2 

9

Discrete Discrete-Time Time Fourier 

Transform Representations 

10

Canonic Form for Deconvolution Using DTFTs 

• need dtto find fida system t that th tconverts t convolution l ti tto addition dditi 

xn [ ] = x[ n] ∗x[ 

n] 

1 2 

jω = 1 

jω ⋅ 2 

jω 

Xe ( 

• since 

) X( e ) X( e ) 

D ˆ ˆ ˆ 

∗{ 

xn [ ]} = x1[ n] + x2[ n] = xn [ ] 

⎡ jω ( ) ⎤ ˆ jω ˆ jω ˆ jω 

D∗ 

Xe = X1( e ) + X2( e ) = Xe ( ) 

⎣ ⎦ 

=> use log function wh ich converts products to sums 

ˆ jω ( ) log ⎡ jω ( ) ⎤ log ⎡ jω jω 

Xe = Xe = X1( e ) ⋅X2( 

e ) ⎤ 

⎣ ⎦ ⎣ ⎦ 

ω ω 

log ⎡ j ˆ ω ˆ ω 

= 1( ) ⎤+ log ⎡ j 

2( ) ⎤ j j 

X e X e = X1( e ) + X2( e ) 

⎣ ⎦ ⎣ ⎦ 

ˆ jω ( ) ⎡ ˆ jω ˆ jω ˆ ˆ 

1( ) 2( ) ⎤ jω jω 

Ye = L X e + X e = Y1( e ) + Y2( e ) 

⎣ ⎦ 

jω ( ) exp ˆ jω = ˆ j 

Ye ⎡ ω 

Y1( e ) + Y2( e ) ⎤ jω jω 

= Y1( e ) ⋅Y2( 

e 

) 

⎣ ⎦

Characteristic System for 

Deconvolution Using DTFTs 

∞ 

jω − jωn ∑ 

Xe ( ) = xne [ ] 

n=−∞ 

ˆ jω jω jω jω 

X ( e ) = log g ⎡ 

⎣ ⎣X ( e ) ⎦ 

⎤ = log g X( ( e ) + j arg g ⎣ 

⎡Xe ( ) 

⎦ 

⎤ 

π 

1 

ˆ[ ] ˆ jω jωn xn = ( ) ω 

2 ∫ Xe e d 

π −ππ 

12

Inverse Characteristic System for Deconvolution Using 

DTFTs 

∞ 

jω − jωn Ye ˆ( ) = yne ˆ[ 

] 

∑ 

n=−∞ 

j jωω ( ) ˆ j jωω 

Y Y( e ) = exp ⎡ ⎡Y( ( ) 

⎤ 

⎣ 

Y e 

⎦ 

π 

1 

ω ω 

[ ] = ∫ 

j j n 

yn [ ] Ye ( ) e d dωω 

2 

∫ π −π 

13

Issues with Logarithms 

Logarithms 

• it is essential that the logarithm obey the equation 

⎡ j j ⎤ ⎡ j ⎤ ⎡ j 

log ⎡ jωjω ⎤ 

1( ) 2( ) ⎤ log ⎡ jω 1( ) ⎤ log ⎡ jω 

X e ⋅ X e = X e + X2( e ) ⎤ 

⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 

jω jω 

• this is trivial if X ( e ) and X ( e ) are real -- however usually 

1 2 

1( jω ) and 2( 

jω 

) 

X e X e 

• 

are complex 

on the unit circle the complex log can be written in the form: 

jω jω 

⎡ jω 

jarg ⎡ jω 

j X ( e ) ⎤ 

⎣ ⎦ 

Xe ( ) = | Xe ( )| e 

log ⎡ jω ( ) ⎤ ˆ jω ω ω 

( ) log ⎡ j 

= = | ( ) | ⎤+ arg ⎡ j 

Xe Xe Xe j Xe ( ) ⎤ 

⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 

• no problems with log magnitude term; uniqueness 

problems 

arise in defining the imaginary part of the log; can show that 

the imaginary part (the phase angle of the z-transform) z transform) needs 

to be a continuous odd function of ω 

14

Problems with arg Function 

{ ( 

j jω ) } ≠ { 1( 

j jω 

) } 

jω 

+ ARG{ X2( e ) } 

ARG X e ARG X e 

jω jω 

{ Xe } = { X1e } 

jω 

+ arg { X X2 ( e 

) } 

arg ( ) arg ( ) 

15

Complex Cepstrum Properties 

i Given a complex logarithm that satisfies the phase continuity 

condition, , we have: 

π 

1 

jω jω jωn xn ˆ[ ] = (log | X( e ) | + jarg{ X( e )}) e dω 

2π 

∫ 

−π 

j jω j jω 

i If xn [ ] real, real then log|X ( e ) | is an even function of ωω 

and arg{ X ( e )} 

is an odd function of ω. 

This means that the real and imaginary parts of 

the complex log have the appropriate symmetry for xn ˆ[ ] to be a real 

sequence, and d xn ˆ[ ˆ[ ] can bbe represented t d as: 

xn ˆ[ ] = cn [ ] + dn [ ] 

jω 

where cn [ ] is the inverse DTFT of log |X ( e )| and the even part of xn ˆ[ 

], and 

jω 

dn [ ] is the inverse DTFT of arg{ Xe ( )} and the odd part of xn ˆ[ 

]: 

xn ˆ[ ] + xˆ[ −n] xn ˆ[ ] −xˆ[ −n] 

cn [ ] = ; dn [ ] = 

2 2 

16

Complex Complex and and Real Real Cepstrum 

define the inverse Fourier transform of ˆ jω 

• 

Xe ( ) as 

ππ 

1 

ˆ[ ˆ ω ω 

] = ∫ 

j j n 

xn Xe ( ) e dω 

2 

π 

−π 

• where xn ˆ[ 

[ ] called the "complex complex cepstrum" cepstrum since a complex 

logarithm is involved in the computation 

• can also define a "real 

cepstrum" using just the real part of 

the logarithm, giving 

π 

1 

[ ] = ∫ Re ⎡ ˆ j 

( ) ⎤ j n 

cn ∫ Xe e d 

2 2π 

⎣ ⎦ 

= 

1 

2 2π 

−π 

π 

∫ 

π 

−π 

ω ω ω 

jω jωn log | Xe ( ) | e dω 

• can show that cn [ ] is the even part of xn 

ˆ[ 

] 

17

Terminology gy 

• Spectrum – Fourier transform of signal autocorrelation 

• C Cepstrum t – iinverse FFourier i ttransform f of f log l spectrum t 

• Analysis – determining the spectrum of a signal 

• Alanysis – determining the cepstr cepstrumm of a signal 

• Filtering – linear operation on time signal 

• Liftering – linear operation on cepstrum 

• Frequency – independent variable of spectrum 

• Quefrency – independent variable of cepstrum 

• Harmonic – integer multiple of fundamental frequency 

• Rahmonic – integer multiple of fundamental frequency 

Rahmonic integer multiple of fundamental frequency 

18

z-Transform z Transform Representation 

Representation 

i The z − transform of the signal: 

xn [ ] = x x1[ [ n n] ]* x x2[ [ n n] 

] 

is of the form: 

X( z) = X1( z) ⋅X2( 

z) 

i With an appropriate i t ddefinition fi iti of f th the complex l llog, we get: t 

Xˆ ( z) = log{ X( z)} = log{ X1( z) ⋅X2( 

z)} 

= log{ X ( z)} + log{ X ( z)} 

= Xˆ ( z) + Xˆ ( z) 

1 

1 2 

2 

19

Characteristic System for Deconvolution 

∞ 

− 

( ) = ∑ 

n 

Xz xnz [ ] = Xz ( ) e 

n=−∞ 

{ } 

jarg X( z) 

Xˆ ( z) = log X( z) = log X( z) + jarg X( z) 

[ ] [ ] 

1 

ˆ[ ] ∫ 

ˆ n 

xn [ ] = ∫ X ( z ) zdz 

2π 

j 

20

Inverse Characteristic System for 

Deconvolution 

∞ 

Yz ˆ( ) = ∑ ynz ˆ[ 

] 

n=−∞ 

−n 

Yz ( ) = exp ⎡ ˆ( 

) ⎤ 

⎣ 

Yz 

⎦ 

= log Yz ( ) + jarg Yz ( ) 

1 

[ ] ∫ 

n 

[ ] = ∫ 

n 

yn Y ( zzdz ) d 

2π 

j 

[ ] 

21

z-Transform 

z Transform Cepstrum Alanysis 

• 

i 

consider digital systems with rational z-transforms of the general type 

Xz ( ) = 

M 

M M0 

−1 −1 −1 

i 

∏( 1− k ) ∏( 

1− 

) 

k 

A a z b z 

k= 1 k= 

1 

N 

i 

− 1 

∏ ( 1− 

cz k ) 

k = 1 

we can express the above equation as: 

Xz ( ) = 

−M0 M M0 ∏ − 

M Mi 

−1 k ∏ 1− k 

M M0 

−1 

∏ 1− 

k 

k= 1 k = 1 

k= 

1 

Ni 

− 1 

( ) ( ) ( ) 

z A b a z b z 

∏ ( 1 1− 

cz k ) 

k = 1 

• with all coefficients ak, bk, ck < 1 => all ck 

poles and 

a ak zeros are inside the unit circle; all b bk 

zeros 

are outside the unit circle; 

22

z-Transform 


i 

express X( z) 

as product of minimum-phase and 

maximum maximum-phase phase signals signals, ii.e., e 

i 

i 

i 

X z X z z X z 

where 

−M 

0 

( ) = ( ) ⋅ ( ) 

X ( z ) = 

min 

min max 

Mi 

( −1 

∏ 1− 

) k 

A a z 

k = 1 

N 

i 

( 1 

1 cz ) k 

− 

∏ − 

k = 1 

all ll poles l and d zeros iinside id unit it circle i l 

max 

M 

i 

∏ 

( 1 ) 

X ( z) b 1 

− 

= − − bz 

k 

Mi 

∏ 

k = 1 

k = 1 

all zeros outside unit circle 

( ) 

k 

23

z-Transform 


i 

i 

i 

can express xn [ ] as the convolution: 

xn [ ] = xmin[ n] ∗xmax[ n−M0] minimum-phase p component p is causal 

xmin [ n] = 0, n< 

0 

maximum-phase maximum phase component is anti-causal anti causal 

x [ n] = 0, n> 

0 

max 

M0 

factor z is the shift in time ori 

− 0 i factor z is the shift in time ori gin by M 0 

samples required so that the overall sequence, 

xn 

[ ] be causal 

24

z-Transform z Transform Cepstrum Alanysis 

• the complex logarithm of Xz ( ) is 

0 

ˆ 

−1 

− 0 

( ) = log ⎡⎣ ( ) ⎤⎦ 

= log | | + ∑log | | + log[ ] + 

M 

M 

Xz Xz A bk z 

k = 1 

M M 

N 

i 0 

i 

∑ ( −1) ( ) ( −1 

log 1− az + ∑log 1− −∑log 1− 

) 

k bz k cz k 

k = 1 k = 1 k = 1 

• evaluating Xz ˆ ( ) on the unit circle we can ignore the term 

⎡ jωM ⎤ 

e 

⎣ ⎦ 

part and is a linear phase shift) 

0 

related to log (as this contributes only to the imaginary 

25

z-Transform 


• 

we can then evaluate the remaining terms, use power series 

expansion for logarithmic terms (and take the inverse 

transform to give the complex cepstrum) giving: 

ππ 

1 

ˆ 

ˆ ω ω 

( ) = ∫ 

j j n 

xn Xe ( ) e dω 

2 

π 

−π 

= log | A | + 

M 

0 

∑ 

k = 1 

Ni n 

c k 

Mi 

n 

a k 

∑ ∑ 

k= 1 k= 

1 

M0 −n 

bk 

−1 

k 

log | b | n = 

= − n > 

n n 

∑ k 

= n < 0 

n 

k = 1 

0 

0 

∞ n 

Z 

log(1 − Z) =− ∑ , | Z | < 1 

n 

n= 

1 

26

Cepstrum Properties 

1. complex cepstrum is non-zero and of infinite extent for 

both positive and negative n, even though x[ n] 

may be 

causal, or even of finite duration ( X ( z ) has only zeros). 

2. complex cepstrum is a decaying sequence that is bounded by: 

| n| 

α 

| xn ˆ[ 

]| < β , for | n| 

→∞ 

| n | 

3. zero-quefrency value of complex cepstrum (and the cepstrum) 

depends on the gain constant and the zeros outside the unit circle. 

Setting x ˆ[0] [0] = 0 (and therefore c [0] = 0 0) 

) is equivalent to normalizing 

the log magnitude spectrum to a gain constant of: 

M 

0 

∏ 

k = 1 

∏ 

1 

( − b k ) = 1 

A b − 

4. If X( z) has no zeros outside the unit circle (all bk= 

0), then: 

xn ˆ[ ] = 0, n< 

0 (minimum-phase 

signals) 

5. If X( z) has no poles or zeros inside the unit circle (all ak, ck 

= 0), then: 

xn ˆ[ ] = 0, n> 

0 (maximum-phase signals) 

27

i 

i 


The main z-transform formula for cepstrum alanysis is based on 

the h power series i expansion: i 

( 1) 

∞ − 

∑ 

n+ 

1 

n 

log( g( 1+ x ) = x x < 1 

n 

n= 

1 

Example 1--Apply 

this formula to the exponential sequence 

x1( n) 

n 

1 

= aun ( ) ⇔ X( z) 

= 

1− 

az 

1 −1 

( −1 

) 

ˆ 

1 ( ) = l log[ [ 1 ( )] = −l log( ( 1 1− 

∞ 

− 1 

) = − ∑ 

n= 

1 

n+ 

1 

( − 

n 

n 

) 

− n 

n 

a 

ˆ 1 ( ) = 

n 

( − 1 ) ⇔ ˆ 

1 ( ) = − log( 1 1− 

∞ n 

−1⎛a ⎞ 

) = ∑ ⎜ ⎟ 

n= 

1 n 

−n 

X z X z az a z 

x n u n X z az z 

⎝ ⎠ 

28


• Example 2 2--consider 

consider the case of a digital system with a 

single zero outside the unit circle ( b < 1) 

x x2 ( n ) = δδ ( n ) + b bδδ ( n + 1 ) 

X2( z) = 1+ bz (zero at z = −1/ 

b) 

X Xˆ ( z ) = l log X ( z ) = llog 

1 1+ 

b z 

xˆ 

2 

2 2 

[ ] ( ) 

∞ + 1 

( −1) 

= ∑ ( ) 

n 

∑ 

n n 

( b) z 

n= 

1 n 

n+ 1 n 

( −1) 

b 

( n ) = u ( −n−1 ) 

n 

29

• 

z-Transform Transform Cepstrum Alanysis for 2 Pulses 

Example 3--an 

input sequence of two pulses of the form 

x 3 ( n ) = δδ ( n ) + αδ αδ ( n n− N Np 

) ( 0 < αα 

< 

1 ) 

X ( z) = 1+ 

αz 

3 

−N 

( −N 

1 αα 

) 

X Xˆ 3 ( z ) = log ⎡⎣⎡⎣ X X3 ( z ) ⎤⎦⎤⎦ 

= log + z 

∞ n+ 

1 

( −1) 

= ∑ n 

p 

α 

n 

z 

−nN 

n= 

1 

∞ 

k 

ˆ + 1 α 

3( 

) = ∑ 

k 

( −1) k 

k = 1 

δ ( − p ) 

x n n kN 

p 

• the cepstrum is an impulse train with impulses spaced at p samples 

N 

N p 

2N 2Np p 

3N 3Np 30

• 

• 

Cepstrum Cepstrum for Train of Impulses 

an important special case is a train of impulses 

of the form: 

∑ M 

xn ( ) = αδ( 

n−rN) r = 0 

M 

Xz ( ) = ∑αr 

z 

r = 0 

r p 

−rN 

p 

−N −1 

clearly Xz ( ) is a polynomial in z p rather than z ; 

thus Xz ( ) can be expressed 

as a product of factors 

−N 

N 

of the form ( 1−az p) and ( 1−bz 

p), 

giving a complex 

cepstrum, xn ˆ ( ), that is non non-zero zero only at integer multiples of N 

p 

31

z-Transform Transform Cepstrum Alanysis for 

C Convolution l ti of f 2 2 Sequences 

S 

i Example 4 4--consider 

consider the convolution of sequences 1and3 1 and 3, ie i.e., 

n 

x4( n) = x1( n) ∗ x3( n) = ⎡ 

⎣ 

aun ( ) ⎤ 

⎦ 

∗ ⎡ 

⎣δ( n) + αδ( 

n−N) ⎤ p ⎦ 

n 

n n− N Np 

= aun ( ) + α a un ( − N ) 

i The complex cepstrum is therefore the sum of the complex 

cepstra 

of the two sequences (since convolution inthetimedomainis 

in the time domain is 

converted to addition in the cepstral domain) 

xˆ ( n) = xˆ ( n) + xˆ ( n) 

4 1 3 

n ∞ k+ 1 k 

a ( −1) 

α 

un 1 ∑ δ n kNp 

n k = 1 k 

= ( − ) + ( − ) 

p 

32

i 

z-Transform Transform Cepstrum Alanysis for 

C Convolution l ti of f 3 3 Sequences 

S 

Example 5--consider 

the convolution of sequences 1, 2 and 3, i.e., 

x ( n) = x ( n) ∗x ( n) ∗x 

( n) 

5 1 2 3 

[ 1 ] 

n 

= ⎡ 

⎣ 

⎡ 

⎣ 

aun ( ) ⎤ 

⎦ 

⎤ 

⎦ 

∗ δδ ( n ) + b bδδ ( n + ) ∗ ⎡ 

⎣ 

⎡ 

⎣δδ ( n ) + αδ δ ( n n− N ) ⎤ 

⎦ 

⎤ p ⎦ 

n n−Np = aun ( ) + αaun ( − N 

n 

n− Np+ 

1 

) + baun ( + 1) + αbaun 

( − N + 1) 

p p 

i The complex cepstrum is therefore the sum of the complex cepstra 

of the three sequences 

xˆ ( n) = xˆ ( n) + xˆ ( n) + xˆ ( n) 

5 1 2 3 

n ∞ k+ 1 k n+ 1 n 

a ( −1) α ( −1) 

b 

= un ( − 1) + ∑ δ ( n− kNp) + u( −n−1) n k n 

k = 1 

33

Example: a=.9, b=.8, a=.7, Np=15 

( 1) 

+ 

− 

n 

b 

n 1 n 

a n 

a 

n 

( 1+ 

bz) 

p 

Xz ( ) = + z 

( 1 1− 

az ) 

( 1) 

k 

∞ − 

∑ 

k = 1 

k+ 1 k 

( 1 α ) 

1 

− 

− 

N 

( ) 

az 

α 

δ [ n−kN ] 

p 

34

Homomorphic Analysis of 

Speech Model 

35

Homomorphic Analysis of Speech Model 

• 

i 

• 

i 

the transfer function for voiced speech is of the form 

H ( z) = A ⋅G( 

z) V( z) R( z) 

V V 

with ith effective ff ti impulse i l response ffor voiced i d speech h 

h [ n] = A ⋅g[ n] ∗v[ n] ∗r[ 

n] 

V V 

similarly for unvoiced speech we have 

H ( z) = A ⋅V( 

z) R( z) 

U 

U 

with effective impulse response for unvoiced speech 

h [ n] = A ⋅v[ n] ∗r[ 

n] 

U U 

36

Complex Complex Cepstrum for Speech 

• 

the models for the speech components are as follows: 

1. vocal tract: Vz ( ) = 

Mi 

−M ∏ 1− M0 

− 1 

∏ 1− 

= 1 = 1 

−1 

∏( 1− 

) 

i 

M 

k k 

k k 

N 

∏ cz k 

k = 1 

Az ( a z ) ( b z) 

--for voiced speech, only poles => ak = bk = 0, 

all k 

--unvoiced speech and nasals, 

need pole-zero model but all poles are 

inside the unit circle => ck 

< 1 

--all speech has complex poles and zeros that occur in complex conjugate 

pairs 

2. radiation model: Rz ( ) ≈1−z(high frequency emphasis) 

3. glottal pulse model: finite duration pulse with transform 

L 

i 

0 

− 1 

∏ ∏ 

−1 

G ( z ) = B ( 1−αz ) ( 1−βz 

) 

k k 

k= 1 k= 

1 

with zeros both inside and outside the unit circle 

L 

37

Complex Cepstrum for Voiced 

S Speech h 

• combination of vocal tract, glottal pulse and 

radiation will be non-minimum phase p => 

complex cepstrum exists for all values of n 

• the complex cepstrum will decay rapidly for 

large n (due to polynomial terms in expansion 

of complex cepstrum) 

• effect of the voiced source is a periodic pulse 

train for multiples of the pitch period 

38

• 

• 

Simplified Speech Speech Model 

Model 

short-time speech model 

[ ∗ ∗ ∗ ] 

xn [ ] = wn [ ] ⋅ pn [ ] ∗ gn [ ] ∗ vn [ ] ∗ rn [ ] 

≈ p [ n] ∗h 

[ n] 

w v 

short-time complex cepstrum 

xn ˆ [ ] = p pˆ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ] 

w 

39

Analysis y of Model for Voiced Speech p 

i Assume sustained /AE/ vowel with fundamental frequency of 125 Hz 

i Use glottal pulse model of the form: 

⎧0.5 

[1− cos( π ( n+ 1) / N1)] ⎪ 

gn [ ] = ⎨cos(0.5 

π ( n+ 1 −N1)/ N2) ⎪ 

⎩ 0 

0 ≤n≤ N1−1 

N1 ≤n≤ N1+ N2 

−2 

otherwise 

N = 25, N = 10 

⇒ 34 sample impulse response, with transform 

i 

1 2 

33 

−33 ∏ 

33 

−1 

k ∏ k 

k= 1 k= 

1 

Gz ( ) = z ( −b) (1 −bz) ⇒ all roots outside unit circle ⇒ maximum phase 

Vocal tract system specified by 5 formants (frequencies and bandwidths) 

V( z) 

= 

1 

5 

−2πσ kT −1 −4πσ 

kT 

−2 

∏ (1 (1− 2 e cos(2 ππ 

FT k ) z + e z ) 

k = 1 

{ Fk, 

σ k} 

= [(660,60),(1720,100),(2410,120),(3500,175),(4500,250)] 

i Radiation load is simple first difference 

−1 

R( z) = 1 − γz, γ = 

0.96 

40

Time Domain Analysis 

Analysis 

41

Pole Pole-Zero Zero Analysis of Model 

Components 

42

Spectral Analysis Analysis of of Model 

Model 

43

Speech Speech Model Output 

44

Complex Cepstrum of Model 

i 

The voiced speech signal is modeled as: 

x [ n ] = A AV ⋅gn [ ] ∗vn [ ] ∗rn [ ] ∗ pn [ ] 

i with complex cepstrum: 

sn ˆ [ ] = log| A | δ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ] + pn ˆ 

V δ n + gn + vn + rn + pn [ ] 

i glottal pulse is maximum phase ⇒ gn ˆ[ ] = 0, n> 

0 

i vocal tract 

and radiation systems y are minimum phase p 

⇒ vn ˆ[ ] = 0, n< 0, rn ˆ[ 

] = 0, n< 

0 

∞ k 

ˆ β 

Pz ( ) = − log(1 − ββ 

z ) = ∑ z 

k 

−N −kN 

= =∑ 

k = 1 

∞ k 

β 

p pˆ[ 

[ n ] = ∑ 

δ [ n−kN p ] 

k 

k = 1 

p p 

45

Cepstral Analysis Analysis of of Model 

Model 

46

Resulting Complex and Real 

Cepstra 

47

Frequency Frequency Domain Representations 

48

Frequency 

Frequency-Domain Domain Representation of 

C Complex l Cepstrum C t 

49

The Complex Cepstrum-DFT Cepstrum DFT Implementation 

i 

• 

2π∞2π j k − j kn 

N N 

∑ 

Xk [ ] = Xe ( ) = xne [ ] k= 01 , ,..., N−1, 

n=−∞ 

jω 

X [ k ] is the N point p DFT corresponding p g to X( ( e ) 

p 

{ } { } 

Xk ˆ = Xe ˆ = Xk = Xk + j Xk 

j2πk/ N 

[ ] ( ) log [ ] log [ ] arg [ ] 

N−12π∞ 

1 

j kn 

ˆ ∑ ˆ N 

xn [ ] = ∑Xke [ ] = ∑ xn ˆ ˆ[ 

+ rN] n= 01 , ,..., N−1 

N k= 0 

r=−∞ 

x ˆ [ n ] is an aliased version of xˆ [ n ] 

⇒use as large a value of N 

as possible to minimize aliasing 

50

Inverse System- System y DFT Implementation 

p 

51

The Cepstrum Cepstrum-DFT DFT Implementation 

π 

1 

jω jωn cn [ ] = ∫ log| Xe ( )| e dωω −∞< ∞< n

Cepstral Computation Aliasing 

N=256, N Np=75, =75, 

α=0 α=0.8 =0 =0.8 8 

Circle dots are 

cepstrum 

values in 

correct 

locations; all 

other dots are 

results of 

aliasing due to 

finite range 

computations 

53

Summary 

1. Homomorphic System for Convolution: 

x[n] X(z) X(z) ^ X(z) Y(z) ^ Y(z) Y(z) 

y[n] 

z( ) log( ) L( ) exp( ) z-1 ( ) 

X(z) 

^ 

z 1 ( ) -1 ( ) 

2. Practical Case: 

z( ) → DFT 

−1 

z () → IDFT 

x[n] ^ [ ] Linear 

System 

y[n] ^ y[ y[n] ] 

jω jω 

Xe ( ) = Xe ( ) e 

jω 

{ } 

jarg X( e ) 

z( ( ) 

Y(z) 

^ 

{ } 

jω jω jω 

log ⎡ 

⎣ 

Xe ( ) ⎤ 

⎦ 

= log Xe ( ) + jarg Xe ( ) 

54

Summary 

3. Complex Cepstrum: 

π 

1 

ˆ[ 

] = ∫ 

ˆ j j n 

xn Xe ( ) e d 

2 2π 

4. Cepstrum: 

π −π 

π 

ω ω ω 

1 

[ ] = ∫ 

j j n 

cn log Xe ( ) e d 

2π 

−π 

ω ω ω 

xn ˆ[ ] + xˆ[ −n] 

cn [ ] = = even part of xn ˆ[ 

] 

2 

55

x(n) 

Summary 

X(k X(k) ^ 

X(k X(k) 

x(n x(n) ^ 

DFT Complex Log IDFT 

~ 

5. Practical Implementation of Complex Cepstrum: 

∞ 

j N k j N kn 

∑ 

( 2 2ππ / ) − ( 2 2ππ 

/ ) 

Xk [ ] = X Xe ( ) = x [ n ] e ) 

6. Examples: 

n=−∞ 

{ p } p { p } 

Xˆ [ k] = log X ( k) = log X ( k) + jarg X ( k) 

N N− 

1 

∞ 

1 j 2π 

/ N kn 

( 

xn ˆ[ 

] = ∑Xke ˆ [ ] 

N k= 0 

) 

= ∑ xn ˆ[ 

+ rN] 

⇒aliasing 

r=−∞ 

xn ˆ[ 

] = aliased version 

of xn ˆ[ 

] 

∞ 

1 

( ) = ⇔ ˆ[ 

] = ∑ δ [ − ] 

1− 

r 

a 

Xz xn n r 

−1 

az r = 1 r 

∞ r 

b 

Xz ( ) = 1−bz⇔ xn ˆ[ 

] = − ∑ δ [ n+ r] 

r 

r = 1 

56

Complex Cepstrum Without Phase 

Unwrapping 

i short-time analysis uses finite-length windowed segments, xn [ ] 

i 

M 

( ) = ∑ [ ] 

n= 

0 

−n 

, 

thh 

-order polynomial 

X z x n z M 

Find polynomial roots 

M M 

−1 −1 −1 

m m 

m= 1 m= 

1 

i 

0 

∏ ∏ 

X( z) = x[0] (1 −a z ) (1 −b 

z ) 

i am 

roots are inside unit circle (minimum-phase 

part) 

i b roots t are outside t id unit it circle i l ( (maximum-phase i h part) t) 

i 

m 

−1 −1 

Factor out terms of form − bmz giving: 

MiM0 −M 0 

−1 

X ( z ) = Az A ∏ (1 −a a mz ) ∏ (1 −b 

b mz 

) 

m= 1 m= 

1 

M0 

M0 

−1 

A= x[0]( −1) 

∏bm 

mm= 

= 1 

i Use polynomial root finder 

to find the zeros that lie inside 

and outside the unit circle and solve directly for xn 

ˆ[ ] . 

57

Cepstrum for Minimum Phase Signals 

• for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum 

can be completely represented by the real part of the Fourier transforms 

• this means we can represent the complex 

cepstrum of minimum phase signals by the log 

of the magnitude of the FT alone 

• since the real part of the FT is the FT of the even part of the sequence 

ˆ( ) ˆ( 

) 

Re ⎡ ˆ jω ( ) ⎤ 

⎡ xn + x−n⎤ Xe = FT 

⎣ ⎦ ⎢ 

2 

⎥ 

⎣ ⎦ 

jω 

FT ⎡ ⎣ ⎣c( n) ⎤ ⎦ = log Xe ( ) 

xn ˆ( ) + xˆ( −n) 

cn ( ) = 

2 

• giving 

xˆ ( n) = 0 n < 0 

= cn ( ) n= 

0 

= 2cn ( ) n> 

0 

• 

thus the complex cepstrum (for minimum phase signals) can be computed by computing 

the cepstrum and using the equation above 

58

Recursive Relation for Complex 

C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l 

• th the complex l cepstrum t for f minimum i i phase h signals i l 

can be computed recursively from the input signal, 

xn ( ) using the relation 

xn ˆ( ) = 0 n< 

0 

= log ⎡⎣x( 0) ⎤⎦ 

n = 0 

n−1 

xn ( ) ⎛ k ⎞ xn ( − k ) 

= − ˆ( ) 

> 

( 0) ⎜ ⎟xk 

n 

x ⎝n⎠ x( 

0) 

∑ k 

= 0 

0 

59



xn ( ) ←⎯→ Xz ( ) 1. basic z-transform 

dX ( z ) 

nx( n) ←⎯→− z =−zX′ 

( z) dz 

2. scale by n rule 

xn ˆ( ) ←⎯→ Xz ˆ ( ) = log Xz ( ) 3. definition of complex cepstrum 

[ ] 

dXˆ ( z) d X′ ( z) 

= [ log[ Xz ( )] ] = 

dz dz X( z) 

4. differentiation of z-transform 

dX dXˆ ( z ) 

−z Xz ( ) =−zX′ 

( z) 

dz 

5. multiply both sides of equation 

60



dXˆ ( z) 

nxˆ( ( n ) ∗x ( n ) ←⎯→− z X ( z ) =−zX′ ( z ) ←⎯→ nx ( n ) 

ddz 

∞ 

∑ 

( ) 

nx( n) = xˆ ( k) x( n −k) 

k 

k =−∞ 

i for minimum phase systems we have xˆ ( n) = 0for n < 0, 

xn ( ) = 0for n< 

0, 

giving: 

n 

( ) ˆ 

⎛k⎞ xn ( ) = ∑ xkxn ( ) ( − k ) ⎜ ⎟ 

k = 0 

⎝n⎠ i separating out the term for k = n we get: 

n−1 

⎛k ˆ 

⎛k ⎞ 

xn ( ) = ∑ xkxn ( ) ( − k) ⎜ ⎟+ 

x( 0) 

xn ˆ( 

) 

k = 0 

⎝n⎠ n−1 

xn ( ) xn ( − k) ˆ( ) ˆ 

⎛k⎞ x( n) = − ∑ x ( k ) , > 0 

( 0) ( 0) 

⎜ ⎟, 

n 

x( 0) 0 ( 0) 

⎜ ⎟ 

k = x ⎝ ⎝n ⎠ 

xˆ( 0) = log x( 0) , xˆ( n) = 0, n 

< 0 

[ ] 

61



i why is xˆ( 0) = log[ x( 

0)]? 

i assume we have a finite sequence xn ( ) ), n n= 01 , ,..., N N− 

1 

i we can write xn ( ) as: 

xn ( ) = x( 0) δ( n) + x( 1) δ( n− 1) + ... + xN ( −1) δ( 

N−1) 

⎡ x () 1 x ( N − 1 ) ⎤ 

= x( 0) ⎢δ( n) + δ( n− 1) + ... + δ( 

N −1) 

⎣ x( 0) x( 

0) 

⎥ 

⎦ 

i taking z − transforms, we get: 

N−1 

∑ 

−n NZ1 ∏ 1 k 

NZ 2 

−1 

∏ 1 k 

n= 0 k= 1 k= 

1 

Xz ( ) = xnz ( ) = G ( −az) ( −bz) 

i where the first term is the gain, G = x( 

0), 

and the two product terms are the 

zeros inside and outside the 

unit circle. 

i for minimum phase systems we have all zeros inside the unit circle so the 

second product term is gone, and we have the result that 

xˆ ( 0 ) = log[ g[ G] ] = log[ g[ x ( 0)]; )] xˆ ( n ) = 0, n < 0 

NZ1 a 

xn ˆ( ) = 0 

k=1 

n ⎛ ⎞ k − ∑⎜ 

⎟ n > 

⎝ n ⎠ 

62

Cepstrum p for Maximum Phase Signals g 

• 

for maximum phase signals (no poles or 

zeros iinside id unit it circle) i l ) 

ˆ( ) + ˆ( 

− ) 

( ) = 

2 

xn x n 

cn 

2 

• giving 

xn ˆ( ( ) = 0 n > 0 

= cn ( ) n= 

0 

= 2cn ( ) n< 

0 

• thus the complex cepstrum (for maximum 

phase signals) can be computed 

by computing 

the cepstrum and using the equation above 

63

• 


C Cepstrum t f for M Maximum i Phase Ph Signals Si l 

the complex cepstrum for maximum phase signals 

can be computed recursively from the input signal, signal 

xn ( ) using the relation 

xn ˆ( ( ) = 0 n > 0 

= log ⎡⎣x( 0) ⎤⎦ 

n = 0 

0 

xn ( ) ⎛k⎞ xn ( − k) 

= − ∑ ˆ( ) 

< 

( 0) ⎜ ⎟xk 

n 

x ⎝n⎠ x( 

0) 

k= n+ 

1 

0 

64

Computing Short-Time 

Short Time 

Cepstrums from Speech 

Speech 

Using Polynomial Roots 

65

Cepstrum From Polynomial Roots 

66

Cepstrum From Polynomial Polynomial Roots 

Roots 

67

Computing Short-Time 

Short Time 

Cepstrums from Speech 

Speech 

Using Using g the DFT 

68

Practical Considerations 

• window to define short-time short time analysis 

• window duration (should be several pitch 

periods long) 

• size of FFT (to minimize aliasing) 

• elimination of linear phase components 

(positioning signals within frames) 

• cutoff quefrency of lifter 

• type of lifter (low/high quefrency) 

69

Computational Considerations 

70

Voiced Speech Example 

Hamming window 

40 msec duration 

(section beginning 

at sample 13000 

in file 

test_16k.wav) 

71

Voiced Speech Speech Example 

Example 

wrapped phase 

unwrapped 

phase 

72


Example 

73

Characteristic System for 

H Homomorphic hi Convolution 

C l i 

• still need to define (and design) the L 

operator p p part ( (the linear system y 

component) of the system to completely 

define the characteristic system y for 

homomorphic convolution for speech 

– to do this properly p p y and correctly, y, need to look 

at the properties of the complex cepstrum for 

speech signals 

74

Complex Cepstrum Cepstrum of of Speech 

Speech 

• model of speech: 

– voiced speech produced by a quasi-periodic 

pulse train exciting slowly time time-varying varying linear 

system => p[n] convolved with hv[n] – unvoiced speech produced by random noise 

exciting slowly time-varying linear system => 

u[n] convolved with hv[n] • time to examine full model and see what 

the complex p cepstrum p of speech p 

looks like 

75

Homomorphic Filtering of Voiced 

• goal is to separate out the 

excitation impulses from the 

remaining components of the 

complex cepstrum 

• use cepstral window, l(n), to 

separate excitation pulses from 

combined vocal tract 

– this window removes combined 

vocal tract 

• the filtered signal is processed by 

the inverse characteristic system 

to recover the combined vocal 

tract component 

S Speech h 

– l(n)=1 for |n|


Example 

Cepstrally 

smoothed log 

magnitude, 50 

quefrencies 

cutoff 

Cepstrally 

unwrapped 

phase, 50 

quefrencies 

cutoff 

77


Example 

Combined 

impulse 

response of 

glottal pulse, 

vocal tract 

system, and 

radiation system 

78


Example 

High quefrency 

liftering; cutoff 

quefrency=50; 

log magnitude 

and unwrapped 

phase 

79


Example 

Estimated 

excitation 

function for 

voiced speech 

(Hamming 

window 

weighted) i ht d) 

80

Unvoiced Speech Speech Example 

Example 

Hamming window 

40 msec duration 

(section beginning 

at sample 3200 in 

file test test_16k.wav) 16k.wav) 

81


Example 

wrapped phase 

unwrapped 

phase 

82


Example 

83


Example 

Cepstrally 

smoothed log 

magnitude, 50 

quefrencies 

cutoff 

Cepstrally 

unwrapped 

phase, 50 

quefrencies 

cutoff 

84


Example 

Estimated 

excitation source 

for unvoiced 

speech section 

(Hamming 

window 

weighted) 

85

Short Short-Time Time Homomorphic Analysis 

STFT 

86

Review Review of Cepstral Calculation 

• 3 potential methods for computing cepstral 

coefficients, xn ˆ[ ] , of sequence x[n] 

– analytical method; assuming X(z) is a rational 

function; find poles and zeros and expand using log 

power series 

– recursion method; assuming X(z) is either a minimum 

phase (all poles and zeros inside unit circle) or 

maximum phase (all poles and zeros outside unit 

circle) sequence 

– DFT implementation; using windows, with phase 

unwrapping (for complex cepstra) 

87

Example 11—single 

single pole sequence 

( (computed t d using i all ll 3 3 methods) th d ) 

[ ] = [ ] 

n 

xn aun 

n 

a 

xn ˆ[ [ ] = un [ − 1 ] 

n 

88

Cepstral Computation Aliasing 

i 

i 

i 

i 

Effect of quefrency aliasing via a simple example 

xn [ ] = δδ [ n ] + αδ αδ [ n n− N ] 

with discrete-time Fourier transform 

( j jω 

) 1 

j jω N 

α − 

X e e ω 

= + 

p 

We can express the complex logarithm as 

p 

ˆ j 

j Np 

X( e ) log{1 e } 

ω 

ω 

α − 

∞ m+ 1 m ⎛( −1) 

α ⎞ − jωmNp = + = ∑⎜ 

⎟e 

m= 

1⎝ 

m ⎠ 

giving a complex cepstrum in the form 

( −1) 

α 

xn ˆ[ [ ] = ∑ 

⎜ ⎟ ⎟δδ 

[ n n− mN mNp 

] 

m= 

1⎝ 

m ⎠ 

∞ m+ 1 m ⎛ ⎞ 

89

Example Example 22—voiced 

2 voiced voiced speech frame 

90

Example 33—low 

low quefrency liftering 

91

Example 33—high 

high quefrency liftering 

92

Example 4—effects 4 effects of low quefrency lifter 

93

Example 55—phase 

phase unwrapping 

94

Example 66—phase 

phase unwrapping 

95

Homomorphic Spectrum Spectrum Smoothing 

Smoothing 

96

Running Cepstrum 

97

Running Cepstrum 

98

Running Cepstrums 

99

Cepstrum Applications 

100

Cepstrum Distance Distance Measures 

Measures 

i The cepstrum forms a natural basis for comparing 

patterns in speech recognition or vector quantization 

because of its stable mathematical characterization 

for speech signals 

i A typical "cepstral distance 

measure" is of the form: 

nco 

∑ 

D= ([ c n] −c[ 

n]) 

n= 

1 

2 

where cn [ ] and cn [ ] are cepstral p sequences q corresponding p g 

to frames of signal, and D is the cepstral distance between 

the pair of sequences. 

i Using Parseval's Parseval s theorem, we can express the cepstral 

distance in the frequency domain as: 

1 

π 

jω jω 

2 

D (log | H( e ) | log | H( e ) |) d 

2π 

π − 

= ∫ 

− 

2π 

i Thus we see that the cepstral distance is actually a log 

magnitude spectral 

distance 

ω 

101

Mel Frequency Cepstral Coefficients 

i Basic idea is to compute a frequency analysis based on a filter 

bank with approximately critical band spacing of the filters and 

bandwidths. For 4 kHz bandwidth, , approximately pp y 20 filters are used. 

i First perform a short-time Fourier analysis, giving Xm[ k], k = 0,1,..., NF / 2 

where m is the frame number and k is the frequency index (1 to half 

thesizeoftheFFT) 

the size of the FFT) 

i Next the DFT values are grouped together in critical bands and weighted 

by triangular weighting functions. 

102

Mel Frequency Cepstral Coefficients 

th th 

i The mel-spectrum of the m frame for the r filter ( r = 1,2,..., R) 

is defined as: 

MF 

U r 1 

[] r = ∑ | V [] k X []| k 

A 

m r m 

r k= L 

r 

2 

th 

where Vr[ k] is the weighting function for the r filter, ranging from 

DFT index L to U , and 

U 

r r 

r 

Ar = ∑ | Vr[ k]| 

k k= L 

r 

2 

th 

is the normalizing factor for the r mel-filter. (Normalization guarantees 

that if the input spectrum is flat, the mel-spectrum is flat). 

i A discrete cosine transform 

of the log magnitude of the filter outputs is 

computed to form the function mfcc [ n] 

as: 

R 1 ⎡2π⎛ 1⎞ 

⎤ 

mfcc mfccm [ n ] = ∑ log( MF MFm 

[ r ])cos ⎢ ⎜ r r+ ⎟ n n⎥ , 

R 

⎢ ⎜ + ⎟ 

r 1 

R 2 

⎥ 

= 

⎣ ⎝ ⎠ ⎦ 

n n= 1,2, 1,2,..., , N Nmfcc 

f 

i Typically N = 13 and R= 

24 for 4 kHz bandwidth speech signals. 

mfcc 

103

Delta Cepstrum 

i The set of mel frequency cepstral coefficients provide perceptually 

meaningful and smooth estimates of speech spectra, over time 

i Since speech is inherently a dynamic signal, it is reasonable to seek 

a representation t ti that th t includes i l d some aspect t of f the th dynamic d i nature t off 

the time derivatives (both first and second order derivatives) of the shortterm 

cepstrum 

i The resulting parameter sets are called th the e delta cepstrum (first derivative) 

and the delta-delta cepstrum (second derivative). 

i The simplest method of computing delta cepstrum parameters is a first 

difference of cepstral vectors, of the form: 

Δ mfccm[ n] = mfccm[ n] −mfccm−1[ 

n] 

i The simple difference is a poor approximation to the first derivative and is 

not generally used. Instead a least-squares approximation to the local slope 

(over a region 

around the current sample) is used, and is of the form: 

Δ mfcc [ n] 

= 

M 

∑ 

k=−M m M 

k( mfcc [ n]) 

∑ 

k=−M k 

m+ k 

2 

where the region is M frames before and after the current frame 

104

Homomorphic Homomorphic Vocoder 

Vocoder 

• time-dependent p complex p cepstrum p retains all the 

information of the time-dependent Fourier 

transform => exact representation of speech 

• time i ddependent d real l cepstrum lloses phase h 

information -> not an exact representation of 

speech 

• quantization of cepstral parameters also loses 

information 

• cepstrum gives good estimates of pitch, voicing, 

formants => can build homomorphic vocoder 

105

Homomorphic Homomorphic Vocoder 

Vocoder 

1. compute p cepstrum p every y 10-20 msec 

2. estimate pitch period and 

voiced/unvoiced decision 

3. quantize and encode low-time cepstral 

values 

44. at t synthesizer-get th i t approximation i ti to t h hv(n) ( ) 

or hu(n) from low time quantized cepstral 

values 

5. convolve hv(n) or hu(n) with excitation 

created from pitch, p , voiced/unvoiced, , and 

amplitude information 

106

Homomorphic Vocoder 

• l( l(n) ) iis cepstrum t window i d th that t selects l t llow-time ti 

values and is of length 26 samples homomorphic 

vocoder 

107

Homomorphic Vocoder Impulse Responses 

108

Summary 

• Introduced the concept of the cepstrum of a signal, 

defined as the inverse Fourier transform of the log of the 

signal spectrum 

1 

j 

ˆ[ ] log ( ) 

xn F X e ω 

− = ⎡ 

⎣ 

⎤ 

⎦ 

• Showed cepstrum reflected properties of both the 

excitation (high quefrency) and the vocal tract (low 

quefrency) f ) 

– short quefrency window filters out excitation; long quefrency 

window filters out vocal tract 

• Mel-scale cepstral coefficients used as feature set for 

speech recognition 

• Delta and delta delta-delta delta cepstral coefficients used as 

indicators of spectral change over time 109

Lecture 12_winter_2012.pdf

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?