14.07.2013 Views

Lecture 12_winter_2012.pdf

Lecture 12_winter_2012.pdf

Lecture 12_winter_2012.pdf

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Digital Speech Processing<br />

Processing—<br />

<strong>Lecture</strong> <strong>12</strong><br />

Homomorphic<br />

p<br />

Speech Processing<br />

1


General Discrete Discrete-Time Time Model of<br />

S Speech h Production<br />

P d i<br />

pL[ n] = p[ n] ∗hV[ n]<br />

−− Voiced Speech<br />

hV[ n] = AV ⋅g[ n] ∗v[ n] ∗r[<br />

n]<br />

p L [ n ] = u [ n ] ∗ h U [ n ] −− Unvoiced Speech<br />

h [ n] = A ⋅v[ n] ∗r[<br />

n]<br />

U N<br />

2


Basic Speech Model<br />

• short segment of speech can be modeled as<br />

having been generated by exciting an LTI system<br />

either by a quasi-periodic impulse train, or a<br />

random noise signal<br />

• speech analysis => estimate parameters of the<br />

speech model, measure their variations (and<br />

perhaps even their statistical variabilites variabilites-for for<br />

quantization) with time<br />

• speech = excitation * system response<br />

=> want to deconvolve speech into excitation and<br />

system response => > do this using homomorphic<br />

filtering methods<br />

3


Superposition Superposition Principle<br />

Principle<br />

+ +<br />

L{<br />

}<br />

x[ n]<br />

y[ n]<br />

= L{<br />

x[<br />

n]}<br />

x n]<br />

+ x [ n]<br />

L { x n]<br />

} + L { x [ n]<br />

}<br />

1[<br />

2<br />

xn [ ] = ax[ n] + bx[ n]<br />

1 2<br />

1[<br />

2<br />

yn [ ] = L { xn [ ]} = a L { x [ n ]} + b L { x [ n ]}<br />

1 2<br />

4


•<br />

Generalized Superposition for Convolution<br />

*<br />

{ }<br />

xn [ ]<br />

H<br />

y [ n ] = H { x [ n ] }<br />

x n]<br />

∗ x [ n]<br />

H { x n]<br />

} ∗H<br />

{ x [ n]<br />

}<br />

1[<br />

2<br />

for LTI systems we have the result<br />

y [ n ] = x [ n ] ∗ h [ n ] = ∑ x [ k ] h [ n−k ]<br />

∞<br />

k =−∞<br />

*<br />

1[<br />

2<br />

• "generalized" superposition => addition replaced by convolution<br />

xn [ ] = x x1 [ n ] ∗ x x2 [ n ]<br />

yn [ ] = H { x[ n]} = H { x1[ n]} ∗H<br />

{ x2[<br />

n]}<br />

• homomorphic system ffor<br />

or convolution<br />

5


Homomorphic Homomorphic Filter<br />

Filter<br />

• homomorphic filter => homomorphic system ⎡<br />

⎣<br />

⎤<br />

⎦ that H<br />

homomorphic filter homomorphic system ⎡<br />

⎣<br />

⎤<br />

⎦ that<br />

passes the desired signal unaltered, while removing the<br />

undesired signal<br />

H<br />

g<br />

x( n) = x [ n] ∗x<br />

[ n] - with x [ n]<br />

the "undesired" signal<br />

•<br />

1 2 1<br />

xn = x 1 n ∗ 2<br />

H { [ ]} H { [ ]}<br />

H<br />

H<br />

H { x [ n]}<br />

{ x [ n]} → δ ( n) - removal of x [ n]<br />

1 1<br />

{ x [ n]} → x [ n]<br />

2 2<br />

H { xn [ ]} = δ [ n] ∗ x2[ n] = x2[ n]<br />

for linear systems this is analogous to additive noise removal<br />

6


Canonic Form for Homomorphic<br />

Deconvolution<br />

* + + + +<br />

D { } L { }<br />

−1<br />

D { }<br />

∗<br />

x[ n] xn ˆ[ ]<br />

yn ˆ[ ]<br />

yn [ ]<br />

x n x n<br />

ˆ ˆ<br />

ˆ ˆ<br />

1 [ ] ∗ 2 [ ] x 1 [ n ] + x 2 [ n ] y 1 [ n ] + y 2 [ n ]<br />

1 ∗ 2<br />

•<br />

any homomorphic system can be represented as a cascade<br />

of three systems, systems e.g., e g for convolution<br />

∗<br />

*<br />

y [ n ] y [ n ]<br />

1. system takes inputs combined by convolution and transforms<br />

them into additive outputs p<br />

2. system is a conventional linear system<br />

3. inverse of first system--takes additive inputs and transforms<br />

th them iinto t convolutional l ti l outputs<br />

t t<br />

7


Canonic Form for Homomorphic Convolution<br />

*<br />

+ + + +<br />

D { } { }<br />

−1{<br />

}<br />

x[ n]<br />

∗<br />

L<br />

D<br />

[ ] xn ˆ[ [ ] y ˆ[ [ n ] ∗ y [ n ]<br />

x n x n<br />

ˆ ˆ<br />

ˆ ˆ<br />

1[ ] ∗ 2[<br />

] x1[ n] + x2[ n]<br />

y1[ n] + y2[ n]<br />

1 ∗ 2<br />

xn [ ] = x1[ n] ∗x2[<br />

n]<br />

- convolutional relation<br />

xn ˆ[ ] = D { [ ]} ˆ ˆ<br />

∗ xn = x1[ n] + x2[ n]<br />

- additive relation<br />

yn ˆ[ ] = L{<br />

xˆ [ n] + xˆ [ n]} = yˆ [ n] + yˆ [ n]<br />

- conventional linear system<br />

1 2 1 2<br />

−1<br />

∗ ˆ1 ˆ2<br />

1 2<br />

*<br />

y[ n] y [ n]<br />

y[ n] = D { y [ n] + y [ n]} = y [ n] ∗y<br />

[ n] - inverse of convolutional relation<br />

=> design converted back to linear system, L<br />

D∗<br />

⎡⎣ ⎤⎦<br />

- fixed (called the characteristic system for homomorphic deconvolution)<br />

−1<br />

D∗ ⎡⎣ ⎤⎦<br />

- fixed (characteristic system for inverse homomorphic deconvolution)<br />

8


Properties of Characteristic<br />

Systems<br />

xn ˆ[ ] = D { xn [ ]} = D { x[ n] ∗x[<br />

n]}<br />

∗ ∗<br />

1 2<br />

= D ∗ { x 1 [ n ]} + D ∗ { x 2 [ n ]}<br />

= xˆ [ n] + xˆ [ n]<br />

1 2<br />

D { yn ˆ[ ]} = D { yˆ [ n] + yˆ [ n]}<br />

−1 −1<br />

∗ ∗<br />

1 2<br />

= D { yˆ [ n]} ∗D<br />

{ yˆ [ n]}<br />

−1 −1<br />

∗ 1 ∗ 2<br />

= y [ n ] ∗ y [ n ] = y<br />

[ n ]<br />

1 2<br />

9


Discrete Discrete-Time Time Fourier<br />

Transform Representations<br />

10


Canonic Form for Deconvolution Using DTFTs<br />

• need dtto find fida system t that th tconverts t convolution l ti tto addition dditi<br />

xn [ ] = x[ n] ∗x[<br />

n]<br />

1 2<br />

jω = 1<br />

jω ⋅ 2<br />

jω<br />

Xe (<br />

• since<br />

) X( e ) X( e )<br />

D ˆ ˆ ˆ<br />

∗{<br />

xn [ ]} = x1[ n] + x2[ n] = xn [ ]<br />

⎡ jω ( ) ⎤ ˆ jω ˆ jω ˆ jω<br />

D∗<br />

Xe = X1( e ) + X2( e ) = Xe ( )<br />

⎣ ⎦<br />

=> use log function wh ich converts products to sums<br />

ˆ jω ( ) log ⎡ jω ( ) ⎤ log ⎡ jω jω<br />

Xe = Xe = X1( e ) ⋅X2(<br />

e ) ⎤<br />

⎣ ⎦ ⎣ ⎦<br />

ω ω<br />

log ⎡ j ˆ ω ˆ ω<br />

= 1( ) ⎤+ log ⎡ j<br />

2( ) ⎤ j j<br />

X e X e = X1( e ) + X2( e )<br />

⎣ ⎦ ⎣ ⎦<br />

ˆ jω ( ) ⎡ ˆ jω ˆ jω ˆ ˆ<br />

1( ) 2( ) ⎤ jω jω<br />

Ye = L X e + X e = Y1( e ) + Y2( e )<br />

⎣ ⎦<br />

jω ( ) exp ˆ jω = ˆ j<br />

Ye ⎡ ω<br />

Y1( e ) + Y2( e ) ⎤ jω jω<br />

= Y1( e ) ⋅Y2(<br />

e<br />

)<br />

⎣ ⎦


Characteristic System for<br />

Deconvolution Using DTFTs<br />

∞<br />

jω − jωn ∑<br />

Xe ( ) = xne [ ]<br />

n=−∞<br />

ˆ jω jω jω jω<br />

X ( e ) = log g ⎡<br />

⎣ ⎣X ( e ) ⎦<br />

⎤ = log g X( ( e ) + j arg g ⎣<br />

⎡Xe ( )<br />

⎦<br />

⎤<br />

π<br />

1<br />

ˆ[ ] ˆ jω jωn xn = ( ) ω<br />

2 ∫ Xe e d<br />

π −ππ<br />

<strong>12</strong>


Inverse Characteristic System for Deconvolution Using<br />

DTFTs<br />

∞<br />

jω − jωn Ye ˆ( ) = yne ˆ[<br />

]<br />

∑<br />

n=−∞<br />

j jωω ( ) ˆ j jωω<br />

Y Y( e ) = exp ⎡ ⎡Y( ( )<br />

⎤<br />

⎣<br />

Y e<br />

⎦<br />

π<br />

1<br />

ω ω<br />

[ ] = ∫<br />

j j n<br />

yn [ ] Ye ( ) e d dωω<br />

2<br />

∫ π −π<br />

13


Issues with Logarithms<br />

Logarithms<br />

• it is essential that the logarithm obey the equation<br />

⎡ j j ⎤ ⎡ j ⎤ ⎡ j<br />

log ⎡ jωjω ⎤<br />

1( ) 2( ) ⎤ log ⎡ jω 1( ) ⎤ log ⎡ jω<br />

X e ⋅ X e = X e + X2( e ) ⎤<br />

⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />

jω jω<br />

• this is trivial if X ( e ) and X ( e ) are real -- however usually<br />

1 2<br />

1( jω ) and 2(<br />

jω<br />

)<br />

X e X e<br />

•<br />

are complex<br />

on the unit circle the complex log can be written in the form:<br />

jω jω<br />

⎡ jω<br />

jarg ⎡ jω<br />

j X ( e ) ⎤<br />

⎣ ⎦<br />

Xe ( ) = | Xe ( )| e<br />

log ⎡ jω ( ) ⎤ ˆ jω ω ω<br />

( ) log ⎡ j<br />

= = | ( ) | ⎤+ arg ⎡ j<br />

Xe Xe Xe j Xe ( ) ⎤<br />

⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />

• no problems with log magnitude term; uniqueness<br />

problems<br />

arise in defining the imaginary part of the log; can show that<br />

the imaginary part (the phase angle of the z-transform) z transform) needs<br />

to be a continuous odd function of ω<br />

14


Problems with arg Function<br />

{ (<br />

j jω ) } ≠ { 1(<br />

j jω<br />

) }<br />

jω<br />

+ ARG{ X2( e ) }<br />

ARG X e ARG X e<br />

jω jω<br />

{ Xe } = { X1e }<br />

jω<br />

+ arg { X X2 ( e<br />

) }<br />

arg ( ) arg ( )<br />

15


Complex Cepstrum Properties<br />

i Given a complex logarithm that satisfies the phase continuity<br />

condition, , we have:<br />

π<br />

1<br />

jω jω jωn xn ˆ[ ] = (log | X( e ) | + jarg{ X( e )}) e dω<br />

2π<br />

∫<br />

−π<br />

j jω j jω<br />

i If xn [ ] real, real then log|X ( e ) | is an even function of ωω<br />

and arg{ X ( e )}<br />

is an odd function of ω.<br />

This means that the real and imaginary parts of<br />

the complex log have the appropriate symmetry for xn ˆ[ ] to be a real<br />

sequence, and d xn ˆ[ ˆ[ ] can bbe represented t d as:<br />

xn ˆ[ ] = cn [ ] + dn [ ]<br />

jω<br />

where cn [ ] is the inverse DTFT of log |X ( e )| and the even part of xn ˆ[<br />

], and<br />

jω<br />

dn [ ] is the inverse DTFT of arg{ Xe ( )} and the odd part of xn ˆ[<br />

]:<br />

xn ˆ[ ] + xˆ[ −n] xn ˆ[ ] −xˆ[ −n]<br />

cn [ ] = ; dn [ ] =<br />

2 2<br />

16


Complex Complex and and Real Real Cepstrum<br />

define the inverse Fourier transform of ˆ jω<br />

•<br />

Xe ( ) as<br />

ππ<br />

1<br />

ˆ[ ˆ ω ω<br />

] = ∫<br />

j j n<br />

xn Xe ( ) e dω<br />

2<br />

π<br />

−π<br />

• where xn ˆ[<br />

[ ] called the "complex complex cepstrum" cepstrum since a complex<br />

logarithm is involved in the computation<br />

• can also define a "real<br />

cepstrum" using just the real part of<br />

the logarithm, giving<br />

π<br />

1<br />

[ ] = ∫ Re ⎡ ˆ j<br />

( ) ⎤ j n<br />

cn ∫ Xe e d<br />

2 2π<br />

⎣ ⎦<br />

=<br />

1<br />

2 2π<br />

−π<br />

π<br />

∫<br />

π<br />

−π<br />

ω ω ω<br />

jω jωn log | Xe ( ) | e dω<br />

• can show that cn [ ] is the even part of xn<br />

ˆ[<br />

]<br />

17


Terminology gy<br />

• Spectrum – Fourier transform of signal autocorrelation<br />

• C Cepstrum t – iinverse FFourier i ttransform f of f log l spectrum t<br />

• Analysis – determining the spectrum of a signal<br />

• Alanysis – determining the cepstr cepstrumm of a signal<br />

• Filtering – linear operation on time signal<br />

• Liftering – linear operation on cepstrum<br />

• Frequency – independent variable of spectrum<br />

• Quefrency – independent variable of cepstrum<br />

• Harmonic – integer multiple of fundamental frequency<br />

• Rahmonic – integer multiple of fundamental frequency<br />

Rahmonic integer multiple of fundamental frequency<br />

18


z-Transform z Transform Representation<br />

Representation<br />

i The z − transform of the signal:<br />

xn [ ] = x x1[ [ n n] ]* x x2[ [ n n]<br />

]<br />

is of the form:<br />

X( z) = X1( z) ⋅X2(<br />

z)<br />

i With an appropriate i t ddefinition fi iti of f th the complex l llog, we get: t<br />

Xˆ ( z) = log{ X( z)} = log{ X1( z) ⋅X2(<br />

z)}<br />

= log{ X ( z)} + log{ X ( z)}<br />

= Xˆ ( z) + Xˆ ( z)<br />

1<br />

1 2<br />

2<br />

19


Characteristic System for Deconvolution<br />

∞<br />

−<br />

( ) = ∑<br />

n<br />

Xz xnz [ ] = Xz ( ) e<br />

n=−∞<br />

{ }<br />

jarg X( z)<br />

Xˆ ( z) = log X( z) = log X( z) + jarg X( z)<br />

[ ] [ ]<br />

1<br />

ˆ[ ] ∫<br />

ˆ n<br />

xn [ ] = ∫ X ( z ) zdz<br />

2π<br />

j<br />

20


Inverse Characteristic System for<br />

Deconvolution<br />

∞<br />

Yz ˆ( ) = ∑ ynz ˆ[<br />

]<br />

n=−∞<br />

−n<br />

Yz ( ) = exp ⎡ ˆ(<br />

) ⎤<br />

⎣<br />

Yz<br />

⎦<br />

= log Yz ( ) + jarg Yz ( )<br />

1<br />

[ ] ∫<br />

n<br />

[ ] = ∫<br />

n<br />

yn Y ( zzdz ) d<br />

2π<br />

j<br />

[ ]<br />

21


z-Transform<br />

z Transform Cepstrum Alanysis<br />

•<br />

i<br />

consider digital systems with rational z-transforms of the general type<br />

Xz ( ) =<br />

M<br />

M M0<br />

−1 −1 −1<br />

i<br />

∏( 1− k ) ∏(<br />

1−<br />

)<br />

k<br />

A a z b z<br />

k= 1 k=<br />

1<br />

N<br />

i<br />

− 1<br />

∏ ( 1−<br />

cz k )<br />

k = 1<br />

we can express the above equation as:<br />

Xz ( ) =<br />

−M0 M M0 ∏ −<br />

M Mi<br />

−1 k ∏ 1− k<br />

M M0<br />

−1<br />

∏ 1−<br />

k<br />

k= 1 k = 1<br />

k=<br />

1<br />

Ni<br />

− 1<br />

( ) ( ) ( )<br />

z A b a z b z<br />

∏ ( 1 1−<br />

cz k )<br />

k = 1<br />

• with all coefficients ak, bk, ck < 1 => all ck<br />

poles and<br />

a ak zeros are inside the unit circle; all b bk<br />

zeros<br />

are outside the unit circle;<br />

22


z-Transform<br />

z Transform Cepstrum Alanysis<br />

i<br />

express X( z)<br />

as product of minimum-phase and<br />

maximum maximum-phase phase signals signals, ii.e., e<br />

i<br />

i<br />

i<br />

X z X z z X z<br />

where<br />

−M<br />

0<br />

( ) = ( ) ⋅ ( )<br />

X ( z ) =<br />

min<br />

min max<br />

Mi<br />

( −1<br />

∏ 1−<br />

) k<br />

A a z<br />

k = 1<br />

N<br />

i<br />

( 1<br />

1 cz ) k<br />

−<br />

∏ −<br />

k = 1<br />

all ll poles l and d zeros iinside id unit it circle i l<br />

max<br />

M<br />

i<br />

∏<br />

( 1 )<br />

X ( z) b 1<br />

−<br />

= − − bz<br />

k<br />

Mi<br />

∏<br />

k = 1<br />

k = 1<br />

all zeros outside unit circle<br />

( )<br />

k<br />

23


z-Transform<br />

z Transform Cepstrum Alanysis<br />

i<br />

i<br />

i<br />

can express xn [ ] as the convolution:<br />

xn [ ] = xmin[ n] ∗xmax[ n−M0] minimum-phase p component p is causal<br />

xmin [ n] = 0, n<<br />

0<br />

maximum-phase maximum phase component is anti-causal anti causal<br />

x [ n] = 0, n><br />

0<br />

max<br />

M0<br />

factor z is the shift in time ori<br />

− 0 i factor z is the shift in time ori gin by M 0<br />

samples required so that the overall sequence,<br />

xn<br />

[ ] be causal<br />

24


z-Transform z Transform Cepstrum Alanysis<br />

• the complex logarithm of Xz ( ) is<br />

0<br />

ˆ<br />

−1<br />

− 0<br />

( ) = log ⎡⎣ ( ) ⎤⎦<br />

= log | | + ∑log | | + log[ ] +<br />

M<br />

M<br />

Xz Xz A bk z<br />

k = 1<br />

M M<br />

N<br />

i 0<br />

i<br />

∑ ( −1) ( ) ( −1<br />

log 1− az + ∑log 1− −∑log 1−<br />

)<br />

k bz k cz k<br />

k = 1 k = 1 k = 1<br />

• evaluating Xz ˆ ( ) on the unit circle we can ignore the term<br />

⎡ jωM ⎤<br />

e<br />

⎣ ⎦<br />

part and is a linear phase shift)<br />

0<br />

related to log (as this contributes only to the imaginary<br />

25


z-Transform<br />

z Transform Cepstrum Alanysis<br />

•<br />

we can then evaluate the remaining terms, use power series<br />

expansion for logarithmic terms (and take the inverse<br />

transform to give the complex cepstrum) giving:<br />

ππ<br />

1<br />

ˆ<br />

ˆ ω ω<br />

( ) = ∫<br />

j j n<br />

xn Xe ( ) e dω<br />

2<br />

π<br />

−π<br />

= log | A | +<br />

M<br />

0<br />

∑<br />

k = 1<br />

Ni n<br />

c k<br />

Mi<br />

n<br />

a k<br />

∑ ∑<br />

k= 1 k=<br />

1<br />

M0 −n<br />

bk<br />

−1<br />

k<br />

log | b | n =<br />

= − n ><br />

n n<br />

∑ k<br />

= n < 0<br />

n<br />

k = 1<br />

0<br />

0<br />

∞ n<br />

Z<br />

log(1 − Z) =− ∑ , | Z | < 1<br />

n<br />

n=<br />

1<br />

26


Cepstrum Properties<br />

1. complex cepstrum is non-zero and of infinite extent for<br />

both positive and negative n, even though x[ n]<br />

may be<br />

causal, or even of finite duration ( X ( z ) has only zeros).<br />

2. complex cepstrum is a decaying sequence that is bounded by:<br />

| n|<br />

α<br />

| xn ˆ[<br />

]| < β , for | n|<br />

→∞<br />

| n |<br />

3. zero-quefrency value of complex cepstrum (and the cepstrum)<br />

depends on the gain constant and the zeros outside the unit circle.<br />

Setting x ˆ[0] [0] = 0 (and therefore c [0] = 0 0)<br />

) is equivalent to normalizing<br />

the log magnitude spectrum to a gain constant of:<br />

M<br />

0<br />

∏<br />

k = 1<br />

∏<br />

1<br />

( − b k ) = 1<br />

A b −<br />

4. If X( z) has no zeros outside the unit circle (all bk=<br />

0), then:<br />

xn ˆ[ ] = 0, n<<br />

0 (minimum-phase<br />

signals)<br />

5. If X( z) has no poles or zeros inside the unit circle (all ak, ck<br />

= 0), then:<br />

xn ˆ[ ] = 0, n><br />

0 (maximum-phase signals)<br />

27


i<br />

i<br />

z-Transform z Transform Cepstrum Alanysis<br />

The main z-transform formula for cepstrum alanysis is based on<br />

the h power series i expansion: i<br />

( 1)<br />

∞ −<br />

∑<br />

n+<br />

1<br />

n<br />

log( g( 1+ x ) = x x < 1<br />

n<br />

n=<br />

1<br />

Example 1--Apply<br />

this formula to the exponential sequence<br />

x1( n)<br />

n<br />

1<br />

= aun ( ) ⇔ X( z)<br />

=<br />

1−<br />

az<br />

1 −1<br />

( −1<br />

)<br />

ˆ<br />

1 ( ) = l log[ [ 1 ( )] = −l log( ( 1 1−<br />

∞<br />

− 1<br />

) = − ∑<br />

n=<br />

1<br />

n+<br />

1<br />

( −<br />

n<br />

n<br />

)<br />

− n<br />

n<br />

a<br />

ˆ 1 ( ) =<br />

n<br />

( − 1 ) ⇔ ˆ<br />

1 ( ) = − log( 1 1−<br />

∞ n<br />

−1⎛a ⎞<br />

) = ∑ ⎜ ⎟<br />

n=<br />

1 n<br />

−n<br />

X z X z az a z<br />

x n u n X z az z<br />

⎝ ⎠<br />

28


z-Transform z Transform Cepstrum Alanysis<br />

• Example 2 2--consider<br />

consider the case of a digital system with a<br />

single zero outside the unit circle ( b < 1)<br />

x x2 ( n ) = δδ ( n ) + b bδδ ( n + 1 )<br />

X2( z) = 1+ bz (zero at z = −1/<br />

b)<br />

X Xˆ ( z ) = l log X ( z ) = llog<br />

1 1+<br />

b z<br />

xˆ<br />

2<br />

2 2<br />

[ ] ( )<br />

∞ + 1<br />

( −1)<br />

= ∑ ( )<br />

n<br />

∑<br />

n n<br />

( b) z<br />

n=<br />

1 n<br />

n+ 1 n<br />

( −1)<br />

b<br />

( n ) = u ( −n−1 )<br />

n<br />

29


•<br />

z-Transform Transform Cepstrum Alanysis for 2 Pulses<br />

Example 3--an<br />

input sequence of two pulses of the form<br />

x 3 ( n ) = δδ ( n ) + αδ αδ ( n n− N Np<br />

) ( 0 < αα<br />

<<br />

1 )<br />

X ( z) = 1+<br />

αz<br />

3<br />

−N<br />

( −N<br />

1 αα<br />

)<br />

X Xˆ 3 ( z ) = log ⎡⎣⎡⎣ X X3 ( z ) ⎤⎦⎤⎦<br />

= log + z<br />

∞ n+<br />

1<br />

( −1)<br />

= ∑ n<br />

p<br />

α<br />

n<br />

z<br />

−nN<br />

n=<br />

1<br />

∞<br />

k<br />

ˆ + 1 α<br />

3(<br />

) = ∑<br />

k<br />

( −1) k<br />

k = 1<br />

δ ( − p )<br />

x n n kN<br />

p<br />

• the cepstrum is an impulse train with impulses spaced at p samples<br />

N<br />

N p<br />

2N 2Np p<br />

3N 3Np 30


•<br />

•<br />

Cepstrum Cepstrum for Train of Impulses<br />

an important special case is a train of impulses<br />

of the form:<br />

∑ M<br />

xn ( ) = αδ(<br />

n−rN) r = 0<br />

M<br />

Xz ( ) = ∑αr<br />

z<br />

r = 0<br />

r p<br />

−rN<br />

p<br />

−N −1<br />

clearly Xz ( ) is a polynomial in z p rather than z ;<br />

thus Xz ( ) can be expressed<br />

as a product of factors<br />

−N<br />

N<br />

of the form ( 1−az p) and ( 1−bz<br />

p),<br />

giving a complex<br />

cepstrum, xn ˆ ( ), that is non non-zero zero only at integer multiples of N<br />

p<br />

31


z-Transform Transform Cepstrum Alanysis for<br />

C Convolution l ti of f 2 2 Sequences<br />

S<br />

i Example 4 4--consider<br />

consider the convolution of sequences 1and3 1 and 3, ie i.e.,<br />

n<br />

x4( n) = x1( n) ∗ x3( n) = ⎡<br />

⎣<br />

aun ( ) ⎤<br />

⎦<br />

∗ ⎡<br />

⎣δ( n) + αδ(<br />

n−N) ⎤ p ⎦<br />

n<br />

n n− N Np<br />

= aun ( ) + α a un ( − N )<br />

i The complex cepstrum is therefore the sum of the complex<br />

cepstra<br />

of the two sequences (since convolution inthetimedomainis<br />

in the time domain is<br />

converted to addition in the cepstral domain)<br />

xˆ ( n) = xˆ ( n) + xˆ ( n)<br />

4 1 3<br />

n ∞ k+ 1 k<br />

a ( −1)<br />

α<br />

un 1 ∑ δ n kNp<br />

n k = 1 k<br />

= ( − ) + ( − )<br />

p<br />

32


i<br />

z-Transform Transform Cepstrum Alanysis for<br />

C Convolution l ti of f 3 3 Sequences<br />

S<br />

Example 5--consider<br />

the convolution of sequences 1, 2 and 3, i.e.,<br />

x ( n) = x ( n) ∗x ( n) ∗x<br />

( n)<br />

5 1 2 3<br />

[ 1 ]<br />

n<br />

= ⎡<br />

⎣<br />

⎡<br />

⎣<br />

aun ( ) ⎤<br />

⎦<br />

⎤<br />

⎦<br />

∗ δδ ( n ) + b bδδ ( n + ) ∗ ⎡<br />

⎣<br />

⎡<br />

⎣δδ ( n ) + αδ δ ( n n− N ) ⎤<br />

⎦<br />

⎤ p ⎦<br />

n n−Np = aun ( ) + αaun ( − N<br />

n<br />

n− Np+<br />

1<br />

) + baun ( + 1) + αbaun<br />

( − N + 1)<br />

p p<br />

i The complex cepstrum is therefore the sum of the complex cepstra<br />

of the three sequences<br />

xˆ ( n) = xˆ ( n) + xˆ ( n) + xˆ ( n)<br />

5 1 2 3<br />

n ∞ k+ 1 k n+ 1 n<br />

a ( −1) α ( −1)<br />

b<br />

= un ( − 1) + ∑ δ ( n− kNp) + u( −n−1) n k n<br />

k = 1<br />

33


Example: a=.9, b=.8, a=.7, Np=15<br />

( 1)<br />

+<br />

−<br />

n<br />

b<br />

n 1 n<br />

a n<br />

a<br />

n<br />

( 1+<br />

bz)<br />

p<br />

Xz ( ) = + z<br />

( 1 1−<br />

az )<br />

( 1)<br />

k<br />

∞ −<br />

∑<br />

k = 1<br />

k+ 1 k<br />

( 1 α )<br />

1<br />

−<br />

−<br />

N<br />

( )<br />

az<br />

α<br />

δ [ n−kN ]<br />

p<br />

34


Homomorphic Analysis of<br />

Speech Model<br />

35


Homomorphic Analysis of Speech Model<br />

•<br />

i<br />

•<br />

i<br />

the transfer function for voiced speech is of the form<br />

H ( z) = A ⋅G(<br />

z) V( z) R( z)<br />

V V<br />

with ith effective ff ti impulse i l response ffor voiced i d speech h<br />

h [ n] = A ⋅g[ n] ∗v[ n] ∗r[<br />

n]<br />

V V<br />

similarly for unvoiced speech we have<br />

H ( z) = A ⋅V(<br />

z) R( z)<br />

U<br />

U<br />

with effective impulse response for unvoiced speech<br />

h [ n] = A ⋅v[ n] ∗r[<br />

n]<br />

U U<br />

36


Complex Complex Cepstrum for Speech<br />

•<br />

the models for the speech components are as follows:<br />

1. vocal tract: Vz ( ) =<br />

Mi<br />

−M ∏ 1− M0<br />

− 1<br />

∏ 1−<br />

= 1 = 1<br />

−1<br />

∏( 1−<br />

)<br />

i<br />

M<br />

k k<br />

k k<br />

N<br />

∏ cz k<br />

k = 1<br />

Az ( a z ) ( b z)<br />

--for voiced speech, only poles => ak = bk = 0,<br />

all k<br />

--unvoiced speech and nasals,<br />

need pole-zero model but all poles are<br />

inside the unit circle => ck<br />

< 1<br />

--all speech has complex poles and zeros that occur in complex conjugate<br />

pairs<br />

2. radiation model: Rz ( ) ≈1−z(high frequency emphasis)<br />

3. glottal pulse model: finite duration pulse with transform<br />

L<br />

i<br />

0<br />

− 1<br />

∏ ∏<br />

−1<br />

G ( z ) = B ( 1−αz ) ( 1−βz<br />

)<br />

k k<br />

k= 1 k=<br />

1<br />

with zeros both inside and outside the unit circle<br />

L<br />

37


Complex Cepstrum for Voiced<br />

S Speech h<br />

• combination of vocal tract, glottal pulse and<br />

radiation will be non-minimum phase p =><br />

complex cepstrum exists for all values of n<br />

• the complex cepstrum will decay rapidly for<br />

large n (due to polynomial terms in expansion<br />

of complex cepstrum)<br />

• effect of the voiced source is a periodic pulse<br />

train for multiples of the pitch period<br />

38


•<br />

•<br />

Simplified Speech Speech Model<br />

Model<br />

short-time speech model<br />

[ ∗ ∗ ∗ ]<br />

xn [ ] = wn [ ] ⋅ pn [ ] ∗ gn [ ] ∗ vn [ ] ∗ rn [ ]<br />

≈ p [ n] ∗h<br />

[ n]<br />

w v<br />

short-time complex cepstrum<br />

xn ˆ [ ] = p pˆ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ]<br />

w<br />

39


Analysis y of Model for Voiced Speech p<br />

i Assume sustained /AE/ vowel with fundamental frequency of <strong>12</strong>5 Hz<br />

i Use glottal pulse model of the form:<br />

⎧0.5<br />

[1− cos( π ( n+ 1) / N1)] ⎪<br />

gn [ ] = ⎨cos(0.5<br />

π ( n+ 1 −N1)/ N2) ⎪<br />

⎩ 0<br />

0 ≤n≤ N1−1<br />

N1 ≤n≤ N1+ N2<br />

−2<br />

otherwise<br />

N = 25, N = 10<br />

⇒ 34 sample impulse response, with transform<br />

i<br />

1 2<br />

33<br />

−33 ∏<br />

33<br />

−1<br />

k ∏ k<br />

k= 1 k=<br />

1<br />

Gz ( ) = z ( −b) (1 −bz) ⇒ all roots outside unit circle ⇒ maximum phase<br />

Vocal tract system specified by 5 formants (frequencies and bandwidths)<br />

V( z)<br />

=<br />

1<br />

5<br />

−2πσ kT −1 −4πσ<br />

kT<br />

−2<br />

∏ (1 (1− 2 e cos(2 ππ<br />

FT k ) z + e z )<br />

k = 1<br />

{ Fk,<br />

σ k}<br />

= [(660,60),(1720,100),(2410,<strong>12</strong>0),(3500,175),(4500,250)]<br />

i Radiation load is simple first difference<br />

−1<br />

R( z) = 1 − γz, γ =<br />

0.96<br />

40


Time Domain Analysis<br />

Analysis<br />

41


Pole Pole-Zero Zero Analysis of Model<br />

Components<br />

42


Spectral Analysis Analysis of of Model<br />

Model<br />

43


Speech Speech Model Output<br />

44


Complex Cepstrum of Model<br />

i<br />

The voiced speech signal is modeled as:<br />

x [ n ] = A AV ⋅gn [ ] ∗vn [ ] ∗rn [ ] ∗ pn [ ]<br />

i with complex cepstrum:<br />

sn ˆ [ ] = log| A | δ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ] + pn ˆ<br />

V δ n + gn + vn + rn + pn [ ]<br />

i glottal pulse is maximum phase ⇒ gn ˆ[ ] = 0, n><br />

0<br />

i vocal tract<br />

and radiation systems y are minimum phase p<br />

⇒ vn ˆ[ ] = 0, n< 0, rn ˆ[<br />

] = 0, n<<br />

0<br />

∞ k<br />

ˆ β<br />

Pz ( ) = − log(1 − ββ<br />

z ) = ∑ z<br />

k<br />

−N −kN<br />

= =∑<br />

k = 1<br />

∞ k<br />

β<br />

p pˆ[<br />

[ n ] = ∑<br />

δ [ n−kN p ]<br />

k<br />

k = 1<br />

p p<br />

45


Cepstral Analysis Analysis of of Model<br />

Model<br />

46


Resulting Complex and Real<br />

Cepstra<br />

47


Frequency Frequency Domain Representations<br />

48


Frequency<br />

Frequency-Domain Domain Representation of<br />

C Complex l Cepstrum C t<br />

49


The Complex Cepstrum-DFT Cepstrum DFT Implementation<br />

i<br />

•<br />

2π∞2π j k − j kn<br />

N N<br />

∑<br />

Xk [ ] = Xe ( ) = xne [ ] k= 01 , ,..., N−1,<br />

n=−∞<br />

jω<br />

X [ k ] is the N point p DFT corresponding p g to X( ( e )<br />

p<br />

{ } { }<br />

Xk ˆ = Xe ˆ = Xk = Xk + j Xk<br />

j2πk/ N<br />

[ ] ( ) log [ ] log [ ] arg [ ]<br />

N−<strong>12</strong>π∞<br />

1<br />

j kn<br />

ˆ ∑ ˆ N<br />

xn [ ] = ∑Xke [ ] = ∑ xn ˆ ˆ[<br />

+ rN] n= 01 , ,..., N−1<br />

N k= 0<br />

r=−∞<br />

x ˆ [ n ] is an aliased version of xˆ [ n ]<br />

⇒use as large a value of N<br />

as possible to minimize aliasing<br />

50


Inverse System- System y DFT Implementation<br />

p<br />

51


The Cepstrum Cepstrum-DFT DFT Implementation<br />

π<br />

1<br />

jω jωn cn [ ] = ∫ log| Xe ( )| e dωω −∞< ∞< n


Cepstral Computation Aliasing<br />

N=256, N Np=75, =75,<br />

α=0 α=0.8 =0 =0.8 8<br />

Circle dots are<br />

cepstrum<br />

values in<br />

correct<br />

locations; all<br />

other dots are<br />

results of<br />

aliasing due to<br />

finite range<br />

computations<br />

53


Summary<br />

1. Homomorphic System for Convolution:<br />

x[n] X(z) X(z) ^ X(z) Y(z) ^ Y(z) Y(z)<br />

y[n]<br />

z( ) log( ) L( ) exp( ) z-1 ( )<br />

X(z)<br />

^<br />

z 1 ( ) -1 ( )<br />

2. Practical Case:<br />

z( ) → DFT<br />

−1<br />

z () → IDFT<br />

x[n] ^ [ ] Linear<br />

System<br />

y[n] ^ y[ y[n] ]<br />

jω jω<br />

Xe ( ) = Xe ( ) e<br />

jω<br />

{ }<br />

jarg X( e )<br />

z( ( )<br />

Y(z)<br />

^<br />

{ }<br />

jω jω jω<br />

log ⎡<br />

⎣<br />

Xe ( ) ⎤<br />

⎦<br />

= log Xe ( ) + jarg Xe ( )<br />

54


Summary<br />

3. Complex Cepstrum:<br />

π<br />

1<br />

ˆ[<br />

] = ∫<br />

ˆ j j n<br />

xn Xe ( ) e d<br />

2 2π<br />

4. Cepstrum:<br />

π −π<br />

π<br />

ω ω ω<br />

1<br />

[ ] = ∫<br />

j j n<br />

cn log Xe ( ) e d<br />

2π<br />

−π<br />

ω ω ω<br />

xn ˆ[ ] + xˆ[ −n]<br />

cn [ ] = = even part of xn ˆ[<br />

]<br />

2<br />

55


x(n)<br />

Summary<br />

X(k X(k) ^<br />

X(k X(k)<br />

x(n x(n) ^<br />

DFT Complex Log IDFT<br />

~<br />

5. Practical Implementation of Complex Cepstrum:<br />

∞<br />

j N k j N kn<br />

∑<br />

( 2 2ππ / ) − ( 2 2ππ<br />

/ )<br />

Xk [ ] = X Xe ( ) = x [ n ] e )<br />

6. Examples:<br />

n=−∞<br />

{ p } p { p }<br />

Xˆ [ k] = log X ( k) = log X ( k) + jarg X ( k)<br />

N N−<br />

1<br />

∞<br />

1 j 2π<br />

/ N kn<br />

(<br />

xn ˆ[<br />

] = ∑Xke ˆ [ ]<br />

N k= 0<br />

)<br />

= ∑ xn ˆ[<br />

+ rN]<br />

⇒aliasing<br />

r=−∞<br />

xn ˆ[<br />

] = aliased version<br />

of xn ˆ[<br />

]<br />

∞<br />

1<br />

( ) = ⇔ ˆ[<br />

] = ∑ δ [ − ]<br />

1−<br />

r<br />

a<br />

Xz xn n r<br />

−1<br />

az r = 1 r<br />

∞ r<br />

b<br />

Xz ( ) = 1−bz⇔ xn ˆ[<br />

] = − ∑ δ [ n+ r]<br />

r<br />

r = 1<br />

56


Complex Cepstrum Without Phase<br />

Unwrapping<br />

i short-time analysis uses finite-length windowed segments, xn [ ]<br />

i<br />

M<br />

( ) = ∑ [ ]<br />

n=<br />

0<br />

−n<br />

,<br />

thh<br />

-order polynomial<br />

X z x n z M<br />

Find polynomial roots<br />

M M<br />

−1 −1 −1<br />

m m<br />

m= 1 m=<br />

1<br />

i<br />

0<br />

∏ ∏<br />

X( z) = x[0] (1 −a z ) (1 −b<br />

z )<br />

i am<br />

roots are inside unit circle (minimum-phase<br />

part)<br />

i b roots t are outside t id unit it circle i l ( (maximum-phase i h part) t)<br />

i<br />

m<br />

−1 −1<br />

Factor out terms of form − bmz giving:<br />

MiM0 −M 0<br />

−1<br />

X ( z ) = Az A ∏ (1 −a a mz ) ∏ (1 −b<br />

b mz<br />

)<br />

m= 1 m=<br />

1<br />

M0<br />

M0<br />

−1<br />

A= x[0]( −1)<br />

∏bm<br />

mm=<br />

= 1<br />

i Use polynomial root finder<br />

to find the zeros that lie inside<br />

and outside the unit circle and solve directly for xn<br />

ˆ[ ] .<br />

57


Cepstrum for Minimum Phase Signals<br />

• for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum<br />

can be completely represented by the real part of the Fourier transforms<br />

• this means we can represent the complex<br />

cepstrum of minimum phase signals by the log<br />

of the magnitude of the FT alone<br />

• since the real part of the FT is the FT of the even part of the sequence<br />

ˆ( ) ˆ(<br />

)<br />

Re ⎡ ˆ jω ( ) ⎤<br />

⎡ xn + x−n⎤ Xe = FT<br />

⎣ ⎦ ⎢<br />

2<br />

⎥<br />

⎣ ⎦<br />

jω<br />

FT ⎡ ⎣ ⎣c( n) ⎤ ⎦ = log Xe ( )<br />

xn ˆ( ) + xˆ( −n)<br />

cn ( ) =<br />

2<br />

• giving<br />

xˆ ( n) = 0 n < 0<br />

= cn ( ) n=<br />

0<br />

= 2cn ( ) n><br />

0<br />

•<br />

thus the complex cepstrum (for minimum phase signals) can be computed by computing<br />

the cepstrum and using the equation above<br />

58


Recursive Relation for Complex<br />

C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />

• th the complex l cepstrum t for f minimum i i phase h signals i l<br />

can be computed recursively from the input signal,<br />

xn ( ) using the relation<br />

xn ˆ( ) = 0 n<<br />

0<br />

= log ⎡⎣x( 0) ⎤⎦<br />

n = 0<br />

n−1<br />

xn ( ) ⎛ k ⎞ xn ( − k )<br />

= − ˆ( )<br />

><br />

( 0) ⎜ ⎟xk<br />

n<br />

x ⎝n⎠ x(<br />

0)<br />

∑ k<br />

= 0<br />

0<br />

59


Recursive Relation for Complex<br />

C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />

xn ( ) ←⎯→ Xz ( ) 1. basic z-transform<br />

dX ( z )<br />

nx( n) ←⎯→− z =−zX′<br />

( z) dz<br />

2. scale by n rule<br />

xn ˆ( ) ←⎯→ Xz ˆ ( ) = log Xz ( ) 3. definition of complex cepstrum<br />

[ ]<br />

dXˆ ( z) d X′ ( z)<br />

= [ log[ Xz ( )] ] =<br />

dz dz X( z)<br />

4. differentiation of z-transform<br />

dX dXˆ ( z )<br />

−z Xz ( ) =−zX′<br />

( z)<br />

dz<br />

5. multiply both sides of equation<br />

60


Recursive Relation for Complex<br />

C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />

dXˆ ( z)<br />

nxˆ( ( n ) ∗x ( n ) ←⎯→− z X ( z ) =−zX′ ( z ) ←⎯→ nx ( n )<br />

ddz<br />

∞<br />

∑<br />

( )<br />

nx( n) = xˆ ( k) x( n −k)<br />

k<br />

k =−∞<br />

i for minimum phase systems we have xˆ ( n) = 0for n < 0,<br />

xn ( ) = 0for n<<br />

0,<br />

giving:<br />

n<br />

( ) ˆ<br />

⎛k⎞ xn ( ) = ∑ xkxn ( ) ( − k ) ⎜ ⎟<br />

k = 0<br />

⎝n⎠ i separating out the term for k = n we get:<br />

n−1<br />

⎛k ˆ<br />

⎛k ⎞<br />

xn ( ) = ∑ xkxn ( ) ( − k) ⎜ ⎟+<br />

x( 0)<br />

xn ˆ(<br />

)<br />

k = 0<br />

⎝n⎠ n−1<br />

xn ( ) xn ( − k) ˆ( ) ˆ<br />

⎛k⎞ x( n) = − ∑ x ( k ) , > 0<br />

( 0) ( 0)<br />

⎜ ⎟,<br />

n<br />

x( 0) 0 ( 0)<br />

⎜ ⎟<br />

k = x ⎝ ⎝n ⎠<br />

xˆ( 0) = log x( 0) , xˆ( n) = 0, n<br />

< 0<br />

[ ]<br />

61


Recursive Relation for Complex<br />

C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />

i why is xˆ( 0) = log[ x(<br />

0)]?<br />

i assume we have a finite sequence xn ( ) ), n n= 01 , ,..., N N−<br />

1<br />

i we can write xn ( ) as:<br />

xn ( ) = x( 0) δ( n) + x( 1) δ( n− 1) + ... + xN ( −1) δ(<br />

N−1)<br />

⎡ x () 1 x ( N − 1 ) ⎤<br />

= x( 0) ⎢δ( n) + δ( n− 1) + ... + δ(<br />

N −1)<br />

⎣ x( 0) x(<br />

0)<br />

⎥<br />

⎦<br />

i taking z − transforms, we get:<br />

N−1<br />

∑<br />

−n NZ1 ∏ 1 k<br />

NZ 2<br />

−1<br />

∏ 1 k<br />

n= 0 k= 1 k=<br />

1<br />

Xz ( ) = xnz ( ) = G ( −az) ( −bz)<br />

i where the first term is the gain, G = x(<br />

0),<br />

and the two product terms are the<br />

zeros inside and outside the<br />

unit circle.<br />

i for minimum phase systems we have all zeros inside the unit circle so the<br />

second product term is gone, and we have the result that<br />

xˆ ( 0 ) = log[ g[ G] ] = log[ g[ x ( 0)]; )] xˆ ( n ) = 0, n < 0<br />

NZ1 a<br />

xn ˆ( ) = 0<br />

k=1<br />

n ⎛ ⎞ k − ∑⎜<br />

⎟ n ><br />

⎝ n ⎠<br />

62


Cepstrum p for Maximum Phase Signals g<br />

•<br />

for maximum phase signals (no poles or<br />

zeros iinside id unit it circle) i l )<br />

ˆ( ) + ˆ(<br />

− )<br />

( ) =<br />

2<br />

xn x n<br />

cn<br />

2<br />

• giving<br />

xn ˆ( ( ) = 0 n > 0<br />

= cn ( ) n=<br />

0<br />

= 2cn ( ) n<<br />

0<br />

• thus the complex cepstrum (for maximum<br />

phase signals) can be computed<br />

by computing<br />

the cepstrum and using the equation above<br />

63


•<br />

Recursive Relation for Complex<br />

C Cepstrum t f for M Maximum i Phase Ph Signals Si l<br />

the complex cepstrum for maximum phase signals<br />

can be computed recursively from the input signal, signal<br />

xn ( ) using the relation<br />

xn ˆ( ( ) = 0 n > 0<br />

= log ⎡⎣x( 0) ⎤⎦<br />

n = 0<br />

0<br />

xn ( ) ⎛k⎞ xn ( − k)<br />

= − ∑ ˆ( )<br />

<<br />

( 0) ⎜ ⎟xk<br />

n<br />

x ⎝n⎠ x(<br />

0)<br />

k= n+<br />

1<br />

0<br />

64


Computing Short-Time<br />

Short Time<br />

Cepstrums from Speech<br />

Speech<br />

Using Polynomial Roots<br />

65


Cepstrum From Polynomial Roots<br />

66


Cepstrum From Polynomial Polynomial Roots<br />

Roots<br />

67


Computing Short-Time<br />

Short Time<br />

Cepstrums from Speech<br />

Speech<br />

Using Using g the DFT<br />

68


Practical Considerations<br />

• window to define short-time short time analysis<br />

• window duration (should be several pitch<br />

periods long)<br />

• size of FFT (to minimize aliasing)<br />

• elimination of linear phase components<br />

(positioning signals within frames)<br />

• cutoff quefrency of lifter<br />

• type of lifter (low/high quefrency)<br />

69


Computational Considerations<br />

70


Voiced Speech Example<br />

Hamming window<br />

40 msec duration<br />

(section beginning<br />

at sample 13000<br />

in file<br />

test_16k.wav)<br />

71


Voiced Speech Speech Example<br />

Example<br />

wrapped phase<br />

unwrapped<br />

phase<br />

72


Voiced Speech Speech Example<br />

Example<br />

73


Characteristic System for<br />

H Homomorphic hi Convolution<br />

C l i<br />

• still need to define (and design) the L<br />

operator p p part ( (the linear system y<br />

component) of the system to completely<br />

define the characteristic system y for<br />

homomorphic convolution for speech<br />

– to do this properly p p y and correctly, y, need to look<br />

at the properties of the complex cepstrum for<br />

speech signals<br />

74


Complex Cepstrum Cepstrum of of Speech<br />

Speech<br />

• model of speech:<br />

– voiced speech produced by a quasi-periodic<br />

pulse train exciting slowly time time-varying varying linear<br />

system => p[n] convolved with hv[n] – unvoiced speech produced by random noise<br />

exciting slowly time-varying linear system =><br />

u[n] convolved with hv[n] • time to examine full model and see what<br />

the complex p cepstrum p of speech p<br />

looks like<br />

75


Homomorphic Filtering of Voiced<br />

• goal is to separate out the<br />

excitation impulses from the<br />

remaining components of the<br />

complex cepstrum<br />

• use cepstral window, l(n), to<br />

separate excitation pulses from<br />

combined vocal tract<br />

– this window removes combined<br />

vocal tract<br />

• the filtered signal is processed by<br />

the inverse characteristic system<br />

to recover the combined vocal<br />

tract component<br />

S Speech h<br />

– l(n)=1 for |n|


Voiced Speech Speech Example<br />

Example<br />

Cepstrally<br />

smoothed log<br />

magnitude, 50<br />

quefrencies<br />

cutoff<br />

Cepstrally<br />

unwrapped<br />

phase, 50<br />

quefrencies<br />

cutoff<br />

77


Voiced Speech Speech Example<br />

Example<br />

Combined<br />

impulse<br />

response of<br />

glottal pulse,<br />

vocal tract<br />

system, and<br />

radiation system<br />

78


Voiced Speech Speech Example<br />

Example<br />

High quefrency<br />

liftering; cutoff<br />

quefrency=50;<br />

log magnitude<br />

and unwrapped<br />

phase<br />

79


Voiced Speech Speech Example<br />

Example<br />

Estimated<br />

excitation<br />

function for<br />

voiced speech<br />

(Hamming<br />

window<br />

weighted) i ht d)<br />

80


Unvoiced Speech Speech Example<br />

Example<br />

Hamming window<br />

40 msec duration<br />

(section beginning<br />

at sample 3200 in<br />

file test test_16k.wav) 16k.wav)<br />

81


Unvoiced Speech Speech Example<br />

Example<br />

wrapped phase<br />

unwrapped<br />

phase<br />

82


Unvoiced Speech Speech Example<br />

Example<br />

83


Unvoiced Speech Speech Example<br />

Example<br />

Cepstrally<br />

smoothed log<br />

magnitude, 50<br />

quefrencies<br />

cutoff<br />

Cepstrally<br />

unwrapped<br />

phase, 50<br />

quefrencies<br />

cutoff<br />

84


Unvoiced Speech Speech Example<br />

Example<br />

Estimated<br />

excitation source<br />

for unvoiced<br />

speech section<br />

(Hamming<br />

window<br />

weighted)<br />

85


Short Short-Time Time Homomorphic Analysis<br />

STFT<br />

86


Review Review of Cepstral Calculation<br />

• 3 potential methods for computing cepstral<br />

coefficients, xn ˆ[ ] , of sequence x[n]<br />

– analytical method; assuming X(z) is a rational<br />

function; find poles and zeros and expand using log<br />

power series<br />

– recursion method; assuming X(z) is either a minimum<br />

phase (all poles and zeros inside unit circle) or<br />

maximum phase (all poles and zeros outside unit<br />

circle) sequence<br />

– DFT implementation; using windows, with phase<br />

unwrapping (for complex cepstra)<br />

87


Example 11—single<br />

single pole sequence<br />

( (computed t d using i all ll 3 3 methods) th d )<br />

[ ] = [ ]<br />

n<br />

xn aun<br />

n<br />

a<br />

xn ˆ[ [ ] = un [ − 1 ]<br />

n<br />

88


Cepstral Computation Aliasing<br />

i<br />

i<br />

i<br />

i<br />

Effect of quefrency aliasing via a simple example<br />

xn [ ] = δδ [ n ] + αδ αδ [ n n− N ]<br />

with discrete-time Fourier transform<br />

( j jω<br />

) 1<br />

j jω N<br />

α −<br />

X e e ω<br />

= +<br />

p<br />

We can express the complex logarithm as<br />

p<br />

ˆ j<br />

j Np<br />

X( e ) log{1 e }<br />

ω<br />

ω<br />

α −<br />

∞ m+ 1 m ⎛( −1)<br />

α ⎞ − jωmNp = + = ∑⎜<br />

⎟e<br />

m=<br />

1⎝<br />

m ⎠<br />

giving a complex cepstrum in the form<br />

( −1)<br />

α<br />

xn ˆ[ [ ] = ∑<br />

⎜ ⎟ ⎟δδ<br />

[ n n− mN mNp<br />

]<br />

m=<br />

1⎝<br />

m ⎠<br />

∞ m+ 1 m ⎛ ⎞<br />

89


Example Example 22—voiced<br />

2 voiced voiced speech frame<br />

90


Example 33—low<br />

low quefrency liftering<br />

91


Example 33—high<br />

high quefrency liftering<br />

92


Example 4—effects 4 effects of low quefrency lifter<br />

93


Example 55—phase<br />

phase unwrapping<br />

94


Example 66—phase<br />

phase unwrapping<br />

95


Homomorphic Spectrum Spectrum Smoothing<br />

Smoothing<br />

96


Running Cepstrum<br />

97


Running Cepstrum<br />

98


Running Cepstrums<br />

99


Cepstrum Applications<br />

100


Cepstrum Distance Distance Measures<br />

Measures<br />

i The cepstrum forms a natural basis for comparing<br />

patterns in speech recognition or vector quantization<br />

because of its stable mathematical characterization<br />

for speech signals<br />

i A typical "cepstral distance<br />

measure" is of the form:<br />

nco<br />

∑<br />

D= ([ c n] −c[<br />

n])<br />

n=<br />

1<br />

2<br />

where cn [ ] and cn [ ] are cepstral p sequences q corresponding p g<br />

to frames of signal, and D is the cepstral distance between<br />

the pair of sequences.<br />

i Using Parseval's Parseval s theorem, we can express the cepstral<br />

distance in the frequency domain as:<br />

1<br />

π<br />

jω jω<br />

2<br />

D (log | H( e ) | log | H( e ) |) d<br />

2π<br />

π −<br />

= ∫<br />

−<br />

2π<br />

i Thus we see that the cepstral distance is actually a log<br />

magnitude spectral<br />

distance<br />

ω<br />

101


Mel Frequency Cepstral Coefficients<br />

i Basic idea is to compute a frequency analysis based on a filter<br />

bank with approximately critical band spacing of the filters and<br />

bandwidths. For 4 kHz bandwidth, , approximately pp y 20 filters are used.<br />

i First perform a short-time Fourier analysis, giving Xm[ k], k = 0,1,..., NF / 2<br />

where m is the frame number and k is the frequency index (1 to half<br />

thesizeoftheFFT)<br />

the size of the FFT)<br />

i Next the DFT values are grouped together in critical bands and weighted<br />

by triangular weighting functions.<br />

102


Mel Frequency Cepstral Coefficients<br />

th th<br />

i The mel-spectrum of the m frame for the r filter ( r = 1,2,..., R)<br />

is defined as:<br />

MF<br />

U r 1<br />

[] r = ∑ | V [] k X []| k<br />

A<br />

m r m<br />

r k= L<br />

r<br />

2<br />

th<br />

where Vr[ k] is the weighting function for the r filter, ranging from<br />

DFT index L to U , and<br />

U<br />

r r<br />

r<br />

Ar = ∑ | Vr[ k]|<br />

k k= L<br />

r<br />

2<br />

th<br />

is the normalizing factor for the r mel-filter. (Normalization guarantees<br />

that if the input spectrum is flat, the mel-spectrum is flat).<br />

i A discrete cosine transform<br />

of the log magnitude of the filter outputs is<br />

computed to form the function mfcc [ n]<br />

as:<br />

R 1 ⎡2π⎛ 1⎞<br />

⎤<br />

mfcc mfccm [ n ] = ∑ log( MF MFm<br />

[ r ])cos ⎢ ⎜ r r+ ⎟ n n⎥ ,<br />

R<br />

⎢ ⎜ + ⎟<br />

r 1<br />

R 2<br />

⎥<br />

=<br />

⎣ ⎝ ⎠ ⎦<br />

n n= 1,2, 1,2,..., , N Nmfcc<br />

f<br />

i Typically N = 13 and R=<br />

24 for 4 kHz bandwidth speech signals.<br />

mfcc<br />

103


Delta Cepstrum<br />

i The set of mel frequency cepstral coefficients provide perceptually<br />

meaningful and smooth estimates of speech spectra, over time<br />

i Since speech is inherently a dynamic signal, it is reasonable to seek<br />

a representation t ti that th t includes i l d some aspect t of f the th dynamic d i nature t off<br />

the time derivatives (both first and second order derivatives) of the shortterm<br />

cepstrum<br />

i The resulting parameter sets are called th the e delta cepstrum (first derivative)<br />

and the delta-delta cepstrum (second derivative).<br />

i The simplest method of computing delta cepstrum parameters is a first<br />

difference of cepstral vectors, of the form:<br />

Δ mfccm[ n] = mfccm[ n] −mfccm−1[<br />

n]<br />

i The simple difference is a poor approximation to the first derivative and is<br />

not generally used. Instead a least-squares approximation to the local slope<br />

(over a region<br />

around the current sample) is used, and is of the form:<br />

Δ mfcc [ n]<br />

=<br />

M<br />

∑<br />

k=−M m M<br />

k( mfcc [ n])<br />

∑<br />

k=−M k<br />

m+ k<br />

2<br />

where the region is M frames before and after the current frame<br />

104


Homomorphic Homomorphic Vocoder<br />

Vocoder<br />

• time-dependent p complex p cepstrum p retains all the<br />

information of the time-dependent Fourier<br />

transform => exact representation of speech<br />

• time i ddependent d real l cepstrum lloses phase h<br />

information -> not an exact representation of<br />

speech<br />

• quantization of cepstral parameters also loses<br />

information<br />

• cepstrum gives good estimates of pitch, voicing,<br />

formants => can build homomorphic vocoder<br />

105


Homomorphic Homomorphic Vocoder<br />

Vocoder<br />

1. compute p cepstrum p every y 10-20 msec<br />

2. estimate pitch period and<br />

voiced/unvoiced decision<br />

3. quantize and encode low-time cepstral<br />

values<br />

44. at t synthesizer-get th i t approximation i ti to t h hv(n) ( )<br />

or hu(n) from low time quantized cepstral<br />

values<br />

5. convolve hv(n) or hu(n) with excitation<br />

created from pitch, p , voiced/unvoiced, , and<br />

amplitude information<br />

106


Homomorphic Vocoder<br />

• l( l(n) ) iis cepstrum t window i d th that t selects l t llow-time ti<br />

values and is of length 26 samples homomorphic<br />

vocoder<br />

107


Homomorphic Vocoder Impulse Responses<br />

108


Summary<br />

• Introduced the concept of the cepstrum of a signal,<br />

defined as the inverse Fourier transform of the log of the<br />

signal spectrum<br />

1<br />

j<br />

ˆ[ ] log ( )<br />

xn F X e ω<br />

− = ⎡<br />

⎣<br />

⎤<br />

⎦<br />

• Showed cepstrum reflected properties of both the<br />

excitation (high quefrency) and the vocal tract (low<br />

quefrency) f )<br />

– short quefrency window filters out excitation; long quefrency<br />

window filters out vocal tract<br />

• Mel-scale cepstral coefficients used as feature set for<br />

speech recognition<br />

• Delta and delta delta-delta delta cepstral coefficients used as<br />

indicators of spectral change over time 109

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!