Lecture 12_winter_2012.pdf
Lecture 12_winter_2012.pdf
Lecture 12_winter_2012.pdf
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Digital Speech Processing<br />
Processing—<br />
<strong>Lecture</strong> <strong>12</strong><br />
Homomorphic<br />
p<br />
Speech Processing<br />
1
General Discrete Discrete-Time Time Model of<br />
S Speech h Production<br />
P d i<br />
pL[ n] = p[ n] ∗hV[ n]<br />
−− Voiced Speech<br />
hV[ n] = AV ⋅g[ n] ∗v[ n] ∗r[<br />
n]<br />
p L [ n ] = u [ n ] ∗ h U [ n ] −− Unvoiced Speech<br />
h [ n] = A ⋅v[ n] ∗r[<br />
n]<br />
U N<br />
2
Basic Speech Model<br />
• short segment of speech can be modeled as<br />
having been generated by exciting an LTI system<br />
either by a quasi-periodic impulse train, or a<br />
random noise signal<br />
• speech analysis => estimate parameters of the<br />
speech model, measure their variations (and<br />
perhaps even their statistical variabilites variabilites-for for<br />
quantization) with time<br />
• speech = excitation * system response<br />
=> want to deconvolve speech into excitation and<br />
system response => > do this using homomorphic<br />
filtering methods<br />
3
Superposition Superposition Principle<br />
Principle<br />
+ +<br />
L{<br />
}<br />
x[ n]<br />
y[ n]<br />
= L{<br />
x[<br />
n]}<br />
x n]<br />
+ x [ n]<br />
L { x n]<br />
} + L { x [ n]<br />
}<br />
1[<br />
2<br />
xn [ ] = ax[ n] + bx[ n]<br />
1 2<br />
1[<br />
2<br />
yn [ ] = L { xn [ ]} = a L { x [ n ]} + b L { x [ n ]}<br />
1 2<br />
4
•<br />
Generalized Superposition for Convolution<br />
*<br />
{ }<br />
xn [ ]<br />
H<br />
y [ n ] = H { x [ n ] }<br />
x n]<br />
∗ x [ n]<br />
H { x n]<br />
} ∗H<br />
{ x [ n]<br />
}<br />
1[<br />
2<br />
for LTI systems we have the result<br />
y [ n ] = x [ n ] ∗ h [ n ] = ∑ x [ k ] h [ n−k ]<br />
∞<br />
k =−∞<br />
*<br />
1[<br />
2<br />
• "generalized" superposition => addition replaced by convolution<br />
xn [ ] = x x1 [ n ] ∗ x x2 [ n ]<br />
yn [ ] = H { x[ n]} = H { x1[ n]} ∗H<br />
{ x2[<br />
n]}<br />
• homomorphic system ffor<br />
or convolution<br />
5
Homomorphic Homomorphic Filter<br />
Filter<br />
• homomorphic filter => homomorphic system ⎡<br />
⎣<br />
⎤<br />
⎦ that H<br />
homomorphic filter homomorphic system ⎡<br />
⎣<br />
⎤<br />
⎦ that<br />
passes the desired signal unaltered, while removing the<br />
undesired signal<br />
H<br />
g<br />
x( n) = x [ n] ∗x<br />
[ n] - with x [ n]<br />
the "undesired" signal<br />
•<br />
1 2 1<br />
xn = x 1 n ∗ 2<br />
H { [ ]} H { [ ]}<br />
H<br />
H<br />
H { x [ n]}<br />
{ x [ n]} → δ ( n) - removal of x [ n]<br />
1 1<br />
{ x [ n]} → x [ n]<br />
2 2<br />
H { xn [ ]} = δ [ n] ∗ x2[ n] = x2[ n]<br />
for linear systems this is analogous to additive noise removal<br />
6
Canonic Form for Homomorphic<br />
Deconvolution<br />
* + + + +<br />
D { } L { }<br />
−1<br />
D { }<br />
∗<br />
x[ n] xn ˆ[ ]<br />
yn ˆ[ ]<br />
yn [ ]<br />
x n x n<br />
ˆ ˆ<br />
ˆ ˆ<br />
1 [ ] ∗ 2 [ ] x 1 [ n ] + x 2 [ n ] y 1 [ n ] + y 2 [ n ]<br />
1 ∗ 2<br />
•<br />
any homomorphic system can be represented as a cascade<br />
of three systems, systems e.g., e g for convolution<br />
∗<br />
*<br />
y [ n ] y [ n ]<br />
1. system takes inputs combined by convolution and transforms<br />
them into additive outputs p<br />
2. system is a conventional linear system<br />
3. inverse of first system--takes additive inputs and transforms<br />
th them iinto t convolutional l ti l outputs<br />
t t<br />
7
Canonic Form for Homomorphic Convolution<br />
*<br />
+ + + +<br />
D { } { }<br />
−1{<br />
}<br />
x[ n]<br />
∗<br />
L<br />
D<br />
[ ] xn ˆ[ [ ] y ˆ[ [ n ] ∗ y [ n ]<br />
x n x n<br />
ˆ ˆ<br />
ˆ ˆ<br />
1[ ] ∗ 2[<br />
] x1[ n] + x2[ n]<br />
y1[ n] + y2[ n]<br />
1 ∗ 2<br />
xn [ ] = x1[ n] ∗x2[<br />
n]<br />
- convolutional relation<br />
xn ˆ[ ] = D { [ ]} ˆ ˆ<br />
∗ xn = x1[ n] + x2[ n]<br />
- additive relation<br />
yn ˆ[ ] = L{<br />
xˆ [ n] + xˆ [ n]} = yˆ [ n] + yˆ [ n]<br />
- conventional linear system<br />
1 2 1 2<br />
−1<br />
∗ ˆ1 ˆ2<br />
1 2<br />
*<br />
y[ n] y [ n]<br />
y[ n] = D { y [ n] + y [ n]} = y [ n] ∗y<br />
[ n] - inverse of convolutional relation<br />
=> design converted back to linear system, L<br />
D∗<br />
⎡⎣ ⎤⎦<br />
- fixed (called the characteristic system for homomorphic deconvolution)<br />
−1<br />
D∗ ⎡⎣ ⎤⎦<br />
- fixed (characteristic system for inverse homomorphic deconvolution)<br />
8
Properties of Characteristic<br />
Systems<br />
xn ˆ[ ] = D { xn [ ]} = D { x[ n] ∗x[<br />
n]}<br />
∗ ∗<br />
1 2<br />
= D ∗ { x 1 [ n ]} + D ∗ { x 2 [ n ]}<br />
= xˆ [ n] + xˆ [ n]<br />
1 2<br />
D { yn ˆ[ ]} = D { yˆ [ n] + yˆ [ n]}<br />
−1 −1<br />
∗ ∗<br />
1 2<br />
= D { yˆ [ n]} ∗D<br />
{ yˆ [ n]}<br />
−1 −1<br />
∗ 1 ∗ 2<br />
= y [ n ] ∗ y [ n ] = y<br />
[ n ]<br />
1 2<br />
9
Discrete Discrete-Time Time Fourier<br />
Transform Representations<br />
10
Canonic Form for Deconvolution Using DTFTs<br />
• need dtto find fida system t that th tconverts t convolution l ti tto addition dditi<br />
xn [ ] = x[ n] ∗x[<br />
n]<br />
1 2<br />
jω = 1<br />
jω ⋅ 2<br />
jω<br />
Xe (<br />
• since<br />
) X( e ) X( e )<br />
D ˆ ˆ ˆ<br />
∗{<br />
xn [ ]} = x1[ n] + x2[ n] = xn [ ]<br />
⎡ jω ( ) ⎤ ˆ jω ˆ jω ˆ jω<br />
D∗<br />
Xe = X1( e ) + X2( e ) = Xe ( )<br />
⎣ ⎦<br />
=> use log function wh ich converts products to sums<br />
ˆ jω ( ) log ⎡ jω ( ) ⎤ log ⎡ jω jω<br />
Xe = Xe = X1( e ) ⋅X2(<br />
e ) ⎤<br />
⎣ ⎦ ⎣ ⎦<br />
ω ω<br />
log ⎡ j ˆ ω ˆ ω<br />
= 1( ) ⎤+ log ⎡ j<br />
2( ) ⎤ j j<br />
X e X e = X1( e ) + X2( e )<br />
⎣ ⎦ ⎣ ⎦<br />
ˆ jω ( ) ⎡ ˆ jω ˆ jω ˆ ˆ<br />
1( ) 2( ) ⎤ jω jω<br />
Ye = L X e + X e = Y1( e ) + Y2( e )<br />
⎣ ⎦<br />
jω ( ) exp ˆ jω = ˆ j<br />
Ye ⎡ ω<br />
Y1( e ) + Y2( e ) ⎤ jω jω<br />
= Y1( e ) ⋅Y2(<br />
e<br />
)<br />
⎣ ⎦
Characteristic System for<br />
Deconvolution Using DTFTs<br />
∞<br />
jω − jωn ∑<br />
Xe ( ) = xne [ ]<br />
n=−∞<br />
ˆ jω jω jω jω<br />
X ( e ) = log g ⎡<br />
⎣ ⎣X ( e ) ⎦<br />
⎤ = log g X( ( e ) + j arg g ⎣<br />
⎡Xe ( )<br />
⎦<br />
⎤<br />
π<br />
1<br />
ˆ[ ] ˆ jω jωn xn = ( ) ω<br />
2 ∫ Xe e d<br />
π −ππ<br />
<strong>12</strong>
Inverse Characteristic System for Deconvolution Using<br />
DTFTs<br />
∞<br />
jω − jωn Ye ˆ( ) = yne ˆ[<br />
]<br />
∑<br />
n=−∞<br />
j jωω ( ) ˆ j jωω<br />
Y Y( e ) = exp ⎡ ⎡Y( ( )<br />
⎤<br />
⎣<br />
Y e<br />
⎦<br />
π<br />
1<br />
ω ω<br />
[ ] = ∫<br />
j j n<br />
yn [ ] Ye ( ) e d dωω<br />
2<br />
∫ π −π<br />
13
Issues with Logarithms<br />
Logarithms<br />
• it is essential that the logarithm obey the equation<br />
⎡ j j ⎤ ⎡ j ⎤ ⎡ j<br />
log ⎡ jωjω ⎤<br />
1( ) 2( ) ⎤ log ⎡ jω 1( ) ⎤ log ⎡ jω<br />
X e ⋅ X e = X e + X2( e ) ⎤<br />
⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />
jω jω<br />
• this is trivial if X ( e ) and X ( e ) are real -- however usually<br />
1 2<br />
1( jω ) and 2(<br />
jω<br />
)<br />
X e X e<br />
•<br />
are complex<br />
on the unit circle the complex log can be written in the form:<br />
jω jω<br />
⎡ jω<br />
jarg ⎡ jω<br />
j X ( e ) ⎤<br />
⎣ ⎦<br />
Xe ( ) = | Xe ( )| e<br />
log ⎡ jω ( ) ⎤ ˆ jω ω ω<br />
( ) log ⎡ j<br />
= = | ( ) | ⎤+ arg ⎡ j<br />
Xe Xe Xe j Xe ( ) ⎤<br />
⎣ ⎦ ⎣ ⎦ ⎣ ⎦<br />
• no problems with log magnitude term; uniqueness<br />
problems<br />
arise in defining the imaginary part of the log; can show that<br />
the imaginary part (the phase angle of the z-transform) z transform) needs<br />
to be a continuous odd function of ω<br />
14
Problems with arg Function<br />
{ (<br />
j jω ) } ≠ { 1(<br />
j jω<br />
) }<br />
jω<br />
+ ARG{ X2( e ) }<br />
ARG X e ARG X e<br />
jω jω<br />
{ Xe } = { X1e }<br />
jω<br />
+ arg { X X2 ( e<br />
) }<br />
arg ( ) arg ( )<br />
15
Complex Cepstrum Properties<br />
i Given a complex logarithm that satisfies the phase continuity<br />
condition, , we have:<br />
π<br />
1<br />
jω jω jωn xn ˆ[ ] = (log | X( e ) | + jarg{ X( e )}) e dω<br />
2π<br />
∫<br />
−π<br />
j jω j jω<br />
i If xn [ ] real, real then log|X ( e ) | is an even function of ωω<br />
and arg{ X ( e )}<br />
is an odd function of ω.<br />
This means that the real and imaginary parts of<br />
the complex log have the appropriate symmetry for xn ˆ[ ] to be a real<br />
sequence, and d xn ˆ[ ˆ[ ] can bbe represented t d as:<br />
xn ˆ[ ] = cn [ ] + dn [ ]<br />
jω<br />
where cn [ ] is the inverse DTFT of log |X ( e )| and the even part of xn ˆ[<br />
], and<br />
jω<br />
dn [ ] is the inverse DTFT of arg{ Xe ( )} and the odd part of xn ˆ[<br />
]:<br />
xn ˆ[ ] + xˆ[ −n] xn ˆ[ ] −xˆ[ −n]<br />
cn [ ] = ; dn [ ] =<br />
2 2<br />
16
Complex Complex and and Real Real Cepstrum<br />
define the inverse Fourier transform of ˆ jω<br />
•<br />
Xe ( ) as<br />
ππ<br />
1<br />
ˆ[ ˆ ω ω<br />
] = ∫<br />
j j n<br />
xn Xe ( ) e dω<br />
2<br />
π<br />
−π<br />
• where xn ˆ[<br />
[ ] called the "complex complex cepstrum" cepstrum since a complex<br />
logarithm is involved in the computation<br />
• can also define a "real<br />
cepstrum" using just the real part of<br />
the logarithm, giving<br />
π<br />
1<br />
[ ] = ∫ Re ⎡ ˆ j<br />
( ) ⎤ j n<br />
cn ∫ Xe e d<br />
2 2π<br />
⎣ ⎦<br />
=<br />
1<br />
2 2π<br />
−π<br />
π<br />
∫<br />
π<br />
−π<br />
ω ω ω<br />
jω jωn log | Xe ( ) | e dω<br />
• can show that cn [ ] is the even part of xn<br />
ˆ[<br />
]<br />
17
Terminology gy<br />
• Spectrum – Fourier transform of signal autocorrelation<br />
• C Cepstrum t – iinverse FFourier i ttransform f of f log l spectrum t<br />
• Analysis – determining the spectrum of a signal<br />
• Alanysis – determining the cepstr cepstrumm of a signal<br />
• Filtering – linear operation on time signal<br />
• Liftering – linear operation on cepstrum<br />
• Frequency – independent variable of spectrum<br />
• Quefrency – independent variable of cepstrum<br />
• Harmonic – integer multiple of fundamental frequency<br />
• Rahmonic – integer multiple of fundamental frequency<br />
Rahmonic integer multiple of fundamental frequency<br />
18
z-Transform z Transform Representation<br />
Representation<br />
i The z − transform of the signal:<br />
xn [ ] = x x1[ [ n n] ]* x x2[ [ n n]<br />
]<br />
is of the form:<br />
X( z) = X1( z) ⋅X2(<br />
z)<br />
i With an appropriate i t ddefinition fi iti of f th the complex l llog, we get: t<br />
Xˆ ( z) = log{ X( z)} = log{ X1( z) ⋅X2(<br />
z)}<br />
= log{ X ( z)} + log{ X ( z)}<br />
= Xˆ ( z) + Xˆ ( z)<br />
1<br />
1 2<br />
2<br />
19
Characteristic System for Deconvolution<br />
∞<br />
−<br />
( ) = ∑<br />
n<br />
Xz xnz [ ] = Xz ( ) e<br />
n=−∞<br />
{ }<br />
jarg X( z)<br />
Xˆ ( z) = log X( z) = log X( z) + jarg X( z)<br />
[ ] [ ]<br />
1<br />
ˆ[ ] ∫<br />
ˆ n<br />
xn [ ] = ∫ X ( z ) zdz<br />
2π<br />
j<br />
20
Inverse Characteristic System for<br />
Deconvolution<br />
∞<br />
Yz ˆ( ) = ∑ ynz ˆ[<br />
]<br />
n=−∞<br />
−n<br />
Yz ( ) = exp ⎡ ˆ(<br />
) ⎤<br />
⎣<br />
Yz<br />
⎦<br />
= log Yz ( ) + jarg Yz ( )<br />
1<br />
[ ] ∫<br />
n<br />
[ ] = ∫<br />
n<br />
yn Y ( zzdz ) d<br />
2π<br />
j<br />
[ ]<br />
21
z-Transform<br />
z Transform Cepstrum Alanysis<br />
•<br />
i<br />
consider digital systems with rational z-transforms of the general type<br />
Xz ( ) =<br />
M<br />
M M0<br />
−1 −1 −1<br />
i<br />
∏( 1− k ) ∏(<br />
1−<br />
)<br />
k<br />
A a z b z<br />
k= 1 k=<br />
1<br />
N<br />
i<br />
− 1<br />
∏ ( 1−<br />
cz k )<br />
k = 1<br />
we can express the above equation as:<br />
Xz ( ) =<br />
−M0 M M0 ∏ −<br />
M Mi<br />
−1 k ∏ 1− k<br />
M M0<br />
−1<br />
∏ 1−<br />
k<br />
k= 1 k = 1<br />
k=<br />
1<br />
Ni<br />
− 1<br />
( ) ( ) ( )<br />
z A b a z b z<br />
∏ ( 1 1−<br />
cz k )<br />
k = 1<br />
• with all coefficients ak, bk, ck < 1 => all ck<br />
poles and<br />
a ak zeros are inside the unit circle; all b bk<br />
zeros<br />
are outside the unit circle;<br />
22
z-Transform<br />
z Transform Cepstrum Alanysis<br />
i<br />
express X( z)<br />
as product of minimum-phase and<br />
maximum maximum-phase phase signals signals, ii.e., e<br />
i<br />
i<br />
i<br />
X z X z z X z<br />
where<br />
−M<br />
0<br />
( ) = ( ) ⋅ ( )<br />
X ( z ) =<br />
min<br />
min max<br />
Mi<br />
( −1<br />
∏ 1−<br />
) k<br />
A a z<br />
k = 1<br />
N<br />
i<br />
( 1<br />
1 cz ) k<br />
−<br />
∏ −<br />
k = 1<br />
all ll poles l and d zeros iinside id unit it circle i l<br />
max<br />
M<br />
i<br />
∏<br />
( 1 )<br />
X ( z) b 1<br />
−<br />
= − − bz<br />
k<br />
Mi<br />
∏<br />
k = 1<br />
k = 1<br />
all zeros outside unit circle<br />
( )<br />
k<br />
23
z-Transform<br />
z Transform Cepstrum Alanysis<br />
i<br />
i<br />
i<br />
can express xn [ ] as the convolution:<br />
xn [ ] = xmin[ n] ∗xmax[ n−M0] minimum-phase p component p is causal<br />
xmin [ n] = 0, n<<br />
0<br />
maximum-phase maximum phase component is anti-causal anti causal<br />
x [ n] = 0, n><br />
0<br />
max<br />
M0<br />
factor z is the shift in time ori<br />
− 0 i factor z is the shift in time ori gin by M 0<br />
samples required so that the overall sequence,<br />
xn<br />
[ ] be causal<br />
24
z-Transform z Transform Cepstrum Alanysis<br />
• the complex logarithm of Xz ( ) is<br />
0<br />
ˆ<br />
−1<br />
− 0<br />
( ) = log ⎡⎣ ( ) ⎤⎦<br />
= log | | + ∑log | | + log[ ] +<br />
M<br />
M<br />
Xz Xz A bk z<br />
k = 1<br />
M M<br />
N<br />
i 0<br />
i<br />
∑ ( −1) ( ) ( −1<br />
log 1− az + ∑log 1− −∑log 1−<br />
)<br />
k bz k cz k<br />
k = 1 k = 1 k = 1<br />
• evaluating Xz ˆ ( ) on the unit circle we can ignore the term<br />
⎡ jωM ⎤<br />
e<br />
⎣ ⎦<br />
part and is a linear phase shift)<br />
0<br />
related to log (as this contributes only to the imaginary<br />
25
z-Transform<br />
z Transform Cepstrum Alanysis<br />
•<br />
we can then evaluate the remaining terms, use power series<br />
expansion for logarithmic terms (and take the inverse<br />
transform to give the complex cepstrum) giving:<br />
ππ<br />
1<br />
ˆ<br />
ˆ ω ω<br />
( ) = ∫<br />
j j n<br />
xn Xe ( ) e dω<br />
2<br />
π<br />
−π<br />
= log | A | +<br />
M<br />
0<br />
∑<br />
k = 1<br />
Ni n<br />
c k<br />
Mi<br />
n<br />
a k<br />
∑ ∑<br />
k= 1 k=<br />
1<br />
M0 −n<br />
bk<br />
−1<br />
k<br />
log | b | n =<br />
= − n ><br />
n n<br />
∑ k<br />
= n < 0<br />
n<br />
k = 1<br />
0<br />
0<br />
∞ n<br />
Z<br />
log(1 − Z) =− ∑ , | Z | < 1<br />
n<br />
n=<br />
1<br />
26
Cepstrum Properties<br />
1. complex cepstrum is non-zero and of infinite extent for<br />
both positive and negative n, even though x[ n]<br />
may be<br />
causal, or even of finite duration ( X ( z ) has only zeros).<br />
2. complex cepstrum is a decaying sequence that is bounded by:<br />
| n|<br />
α<br />
| xn ˆ[<br />
]| < β , for | n|<br />
→∞<br />
| n |<br />
3. zero-quefrency value of complex cepstrum (and the cepstrum)<br />
depends on the gain constant and the zeros outside the unit circle.<br />
Setting x ˆ[0] [0] = 0 (and therefore c [0] = 0 0)<br />
) is equivalent to normalizing<br />
the log magnitude spectrum to a gain constant of:<br />
M<br />
0<br />
∏<br />
k = 1<br />
∏<br />
1<br />
( − b k ) = 1<br />
A b −<br />
4. If X( z) has no zeros outside the unit circle (all bk=<br />
0), then:<br />
xn ˆ[ ] = 0, n<<br />
0 (minimum-phase<br />
signals)<br />
5. If X( z) has no poles or zeros inside the unit circle (all ak, ck<br />
= 0), then:<br />
xn ˆ[ ] = 0, n><br />
0 (maximum-phase signals)<br />
27
i<br />
i<br />
z-Transform z Transform Cepstrum Alanysis<br />
The main z-transform formula for cepstrum alanysis is based on<br />
the h power series i expansion: i<br />
( 1)<br />
∞ −<br />
∑<br />
n+<br />
1<br />
n<br />
log( g( 1+ x ) = x x < 1<br />
n<br />
n=<br />
1<br />
Example 1--Apply<br />
this formula to the exponential sequence<br />
x1( n)<br />
n<br />
1<br />
= aun ( ) ⇔ X( z)<br />
=<br />
1−<br />
az<br />
1 −1<br />
( −1<br />
)<br />
ˆ<br />
1 ( ) = l log[ [ 1 ( )] = −l log( ( 1 1−<br />
∞<br />
− 1<br />
) = − ∑<br />
n=<br />
1<br />
n+<br />
1<br />
( −<br />
n<br />
n<br />
)<br />
− n<br />
n<br />
a<br />
ˆ 1 ( ) =<br />
n<br />
( − 1 ) ⇔ ˆ<br />
1 ( ) = − log( 1 1−<br />
∞ n<br />
−1⎛a ⎞<br />
) = ∑ ⎜ ⎟<br />
n=<br />
1 n<br />
−n<br />
X z X z az a z<br />
x n u n X z az z<br />
⎝ ⎠<br />
28
z-Transform z Transform Cepstrum Alanysis<br />
• Example 2 2--consider<br />
consider the case of a digital system with a<br />
single zero outside the unit circle ( b < 1)<br />
x x2 ( n ) = δδ ( n ) + b bδδ ( n + 1 )<br />
X2( z) = 1+ bz (zero at z = −1/<br />
b)<br />
X Xˆ ( z ) = l log X ( z ) = llog<br />
1 1+<br />
b z<br />
xˆ<br />
2<br />
2 2<br />
[ ] ( )<br />
∞ + 1<br />
( −1)<br />
= ∑ ( )<br />
n<br />
∑<br />
n n<br />
( b) z<br />
n=<br />
1 n<br />
n+ 1 n<br />
( −1)<br />
b<br />
( n ) = u ( −n−1 )<br />
n<br />
29
•<br />
z-Transform Transform Cepstrum Alanysis for 2 Pulses<br />
Example 3--an<br />
input sequence of two pulses of the form<br />
x 3 ( n ) = δδ ( n ) + αδ αδ ( n n− N Np<br />
) ( 0 < αα<br />
<<br />
1 )<br />
X ( z) = 1+<br />
αz<br />
3<br />
−N<br />
( −N<br />
1 αα<br />
)<br />
X Xˆ 3 ( z ) = log ⎡⎣⎡⎣ X X3 ( z ) ⎤⎦⎤⎦<br />
= log + z<br />
∞ n+<br />
1<br />
( −1)<br />
= ∑ n<br />
p<br />
α<br />
n<br />
z<br />
−nN<br />
n=<br />
1<br />
∞<br />
k<br />
ˆ + 1 α<br />
3(<br />
) = ∑<br />
k<br />
( −1) k<br />
k = 1<br />
δ ( − p )<br />
x n n kN<br />
p<br />
• the cepstrum is an impulse train with impulses spaced at p samples<br />
N<br />
N p<br />
2N 2Np p<br />
3N 3Np 30
•<br />
•<br />
Cepstrum Cepstrum for Train of Impulses<br />
an important special case is a train of impulses<br />
of the form:<br />
∑ M<br />
xn ( ) = αδ(<br />
n−rN) r = 0<br />
M<br />
Xz ( ) = ∑αr<br />
z<br />
r = 0<br />
r p<br />
−rN<br />
p<br />
−N −1<br />
clearly Xz ( ) is a polynomial in z p rather than z ;<br />
thus Xz ( ) can be expressed<br />
as a product of factors<br />
−N<br />
N<br />
of the form ( 1−az p) and ( 1−bz<br />
p),<br />
giving a complex<br />
cepstrum, xn ˆ ( ), that is non non-zero zero only at integer multiples of N<br />
p<br />
31
z-Transform Transform Cepstrum Alanysis for<br />
C Convolution l ti of f 2 2 Sequences<br />
S<br />
i Example 4 4--consider<br />
consider the convolution of sequences 1and3 1 and 3, ie i.e.,<br />
n<br />
x4( n) = x1( n) ∗ x3( n) = ⎡<br />
⎣<br />
aun ( ) ⎤<br />
⎦<br />
∗ ⎡<br />
⎣δ( n) + αδ(<br />
n−N) ⎤ p ⎦<br />
n<br />
n n− N Np<br />
= aun ( ) + α a un ( − N )<br />
i The complex cepstrum is therefore the sum of the complex<br />
cepstra<br />
of the two sequences (since convolution inthetimedomainis<br />
in the time domain is<br />
converted to addition in the cepstral domain)<br />
xˆ ( n) = xˆ ( n) + xˆ ( n)<br />
4 1 3<br />
n ∞ k+ 1 k<br />
a ( −1)<br />
α<br />
un 1 ∑ δ n kNp<br />
n k = 1 k<br />
= ( − ) + ( − )<br />
p<br />
32
i<br />
z-Transform Transform Cepstrum Alanysis for<br />
C Convolution l ti of f 3 3 Sequences<br />
S<br />
Example 5--consider<br />
the convolution of sequences 1, 2 and 3, i.e.,<br />
x ( n) = x ( n) ∗x ( n) ∗x<br />
( n)<br />
5 1 2 3<br />
[ 1 ]<br />
n<br />
= ⎡<br />
⎣<br />
⎡<br />
⎣<br />
aun ( ) ⎤<br />
⎦<br />
⎤<br />
⎦<br />
∗ δδ ( n ) + b bδδ ( n + ) ∗ ⎡<br />
⎣<br />
⎡<br />
⎣δδ ( n ) + αδ δ ( n n− N ) ⎤<br />
⎦<br />
⎤ p ⎦<br />
n n−Np = aun ( ) + αaun ( − N<br />
n<br />
n− Np+<br />
1<br />
) + baun ( + 1) + αbaun<br />
( − N + 1)<br />
p p<br />
i The complex cepstrum is therefore the sum of the complex cepstra<br />
of the three sequences<br />
xˆ ( n) = xˆ ( n) + xˆ ( n) + xˆ ( n)<br />
5 1 2 3<br />
n ∞ k+ 1 k n+ 1 n<br />
a ( −1) α ( −1)<br />
b<br />
= un ( − 1) + ∑ δ ( n− kNp) + u( −n−1) n k n<br />
k = 1<br />
33
Example: a=.9, b=.8, a=.7, Np=15<br />
( 1)<br />
+<br />
−<br />
n<br />
b<br />
n 1 n<br />
a n<br />
a<br />
n<br />
( 1+<br />
bz)<br />
p<br />
Xz ( ) = + z<br />
( 1 1−<br />
az )<br />
( 1)<br />
k<br />
∞ −<br />
∑<br />
k = 1<br />
k+ 1 k<br />
( 1 α )<br />
1<br />
−<br />
−<br />
N<br />
( )<br />
az<br />
α<br />
δ [ n−kN ]<br />
p<br />
34
Homomorphic Analysis of<br />
Speech Model<br />
35
Homomorphic Analysis of Speech Model<br />
•<br />
i<br />
•<br />
i<br />
the transfer function for voiced speech is of the form<br />
H ( z) = A ⋅G(<br />
z) V( z) R( z)<br />
V V<br />
with ith effective ff ti impulse i l response ffor voiced i d speech h<br />
h [ n] = A ⋅g[ n] ∗v[ n] ∗r[<br />
n]<br />
V V<br />
similarly for unvoiced speech we have<br />
H ( z) = A ⋅V(<br />
z) R( z)<br />
U<br />
U<br />
with effective impulse response for unvoiced speech<br />
h [ n] = A ⋅v[ n] ∗r[<br />
n]<br />
U U<br />
36
Complex Complex Cepstrum for Speech<br />
•<br />
the models for the speech components are as follows:<br />
1. vocal tract: Vz ( ) =<br />
Mi<br />
−M ∏ 1− M0<br />
− 1<br />
∏ 1−<br />
= 1 = 1<br />
−1<br />
∏( 1−<br />
)<br />
i<br />
M<br />
k k<br />
k k<br />
N<br />
∏ cz k<br />
k = 1<br />
Az ( a z ) ( b z)<br />
--for voiced speech, only poles => ak = bk = 0,<br />
all k<br />
--unvoiced speech and nasals,<br />
need pole-zero model but all poles are<br />
inside the unit circle => ck<br />
< 1<br />
--all speech has complex poles and zeros that occur in complex conjugate<br />
pairs<br />
2. radiation model: Rz ( ) ≈1−z(high frequency emphasis)<br />
3. glottal pulse model: finite duration pulse with transform<br />
L<br />
i<br />
0<br />
− 1<br />
∏ ∏<br />
−1<br />
G ( z ) = B ( 1−αz ) ( 1−βz<br />
)<br />
k k<br />
k= 1 k=<br />
1<br />
with zeros both inside and outside the unit circle<br />
L<br />
37
Complex Cepstrum for Voiced<br />
S Speech h<br />
• combination of vocal tract, glottal pulse and<br />
radiation will be non-minimum phase p =><br />
complex cepstrum exists for all values of n<br />
• the complex cepstrum will decay rapidly for<br />
large n (due to polynomial terms in expansion<br />
of complex cepstrum)<br />
• effect of the voiced source is a periodic pulse<br />
train for multiples of the pitch period<br />
38
•<br />
•<br />
Simplified Speech Speech Model<br />
Model<br />
short-time speech model<br />
[ ∗ ∗ ∗ ]<br />
xn [ ] = wn [ ] ⋅ pn [ ] ∗ gn [ ] ∗ vn [ ] ∗ rn [ ]<br />
≈ p [ n] ∗h<br />
[ n]<br />
w v<br />
short-time complex cepstrum<br />
xn ˆ [ ] = p pˆ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ]<br />
w<br />
39
Analysis y of Model for Voiced Speech p<br />
i Assume sustained /AE/ vowel with fundamental frequency of <strong>12</strong>5 Hz<br />
i Use glottal pulse model of the form:<br />
⎧0.5<br />
[1− cos( π ( n+ 1) / N1)] ⎪<br />
gn [ ] = ⎨cos(0.5<br />
π ( n+ 1 −N1)/ N2) ⎪<br />
⎩ 0<br />
0 ≤n≤ N1−1<br />
N1 ≤n≤ N1+ N2<br />
−2<br />
otherwise<br />
N = 25, N = 10<br />
⇒ 34 sample impulse response, with transform<br />
i<br />
1 2<br />
33<br />
−33 ∏<br />
33<br />
−1<br />
k ∏ k<br />
k= 1 k=<br />
1<br />
Gz ( ) = z ( −b) (1 −bz) ⇒ all roots outside unit circle ⇒ maximum phase<br />
Vocal tract system specified by 5 formants (frequencies and bandwidths)<br />
V( z)<br />
=<br />
1<br />
5<br />
−2πσ kT −1 −4πσ<br />
kT<br />
−2<br />
∏ (1 (1− 2 e cos(2 ππ<br />
FT k ) z + e z )<br />
k = 1<br />
{ Fk,<br />
σ k}<br />
= [(660,60),(1720,100),(2410,<strong>12</strong>0),(3500,175),(4500,250)]<br />
i Radiation load is simple first difference<br />
−1<br />
R( z) = 1 − γz, γ =<br />
0.96<br />
40
Time Domain Analysis<br />
Analysis<br />
41
Pole Pole-Zero Zero Analysis of Model<br />
Components<br />
42
Spectral Analysis Analysis of of Model<br />
Model<br />
43
Speech Speech Model Output<br />
44
Complex Cepstrum of Model<br />
i<br />
The voiced speech signal is modeled as:<br />
x [ n ] = A AV ⋅gn [ ] ∗vn [ ] ∗rn [ ] ∗ pn [ ]<br />
i with complex cepstrum:<br />
sn ˆ [ ] = log| A | δ [ n ] + gn ˆ [ ] + vn ˆ [ ] + rn ˆ [ ] + pn ˆ<br />
V δ n + gn + vn + rn + pn [ ]<br />
i glottal pulse is maximum phase ⇒ gn ˆ[ ] = 0, n><br />
0<br />
i vocal tract<br />
and radiation systems y are minimum phase p<br />
⇒ vn ˆ[ ] = 0, n< 0, rn ˆ[<br />
] = 0, n<<br />
0<br />
∞ k<br />
ˆ β<br />
Pz ( ) = − log(1 − ββ<br />
z ) = ∑ z<br />
k<br />
−N −kN<br />
= =∑<br />
k = 1<br />
∞ k<br />
β<br />
p pˆ[<br />
[ n ] = ∑<br />
δ [ n−kN p ]<br />
k<br />
k = 1<br />
p p<br />
45
Cepstral Analysis Analysis of of Model<br />
Model<br />
46
Resulting Complex and Real<br />
Cepstra<br />
47
Frequency Frequency Domain Representations<br />
48
Frequency<br />
Frequency-Domain Domain Representation of<br />
C Complex l Cepstrum C t<br />
49
The Complex Cepstrum-DFT Cepstrum DFT Implementation<br />
i<br />
•<br />
2π∞2π j k − j kn<br />
N N<br />
∑<br />
Xk [ ] = Xe ( ) = xne [ ] k= 01 , ,..., N−1,<br />
n=−∞<br />
jω<br />
X [ k ] is the N point p DFT corresponding p g to X( ( e )<br />
p<br />
{ } { }<br />
Xk ˆ = Xe ˆ = Xk = Xk + j Xk<br />
j2πk/ N<br />
[ ] ( ) log [ ] log [ ] arg [ ]<br />
N−<strong>12</strong>π∞<br />
1<br />
j kn<br />
ˆ ∑ ˆ N<br />
xn [ ] = ∑Xke [ ] = ∑ xn ˆ ˆ[<br />
+ rN] n= 01 , ,..., N−1<br />
N k= 0<br />
r=−∞<br />
x ˆ [ n ] is an aliased version of xˆ [ n ]<br />
⇒use as large a value of N<br />
as possible to minimize aliasing<br />
50
Inverse System- System y DFT Implementation<br />
p<br />
51
The Cepstrum Cepstrum-DFT DFT Implementation<br />
π<br />
1<br />
jω jωn cn [ ] = ∫ log| Xe ( )| e dωω −∞< ∞< n
Cepstral Computation Aliasing<br />
N=256, N Np=75, =75,<br />
α=0 α=0.8 =0 =0.8 8<br />
Circle dots are<br />
cepstrum<br />
values in<br />
correct<br />
locations; all<br />
other dots are<br />
results of<br />
aliasing due to<br />
finite range<br />
computations<br />
53
Summary<br />
1. Homomorphic System for Convolution:<br />
x[n] X(z) X(z) ^ X(z) Y(z) ^ Y(z) Y(z)<br />
y[n]<br />
z( ) log( ) L( ) exp( ) z-1 ( )<br />
X(z)<br />
^<br />
z 1 ( ) -1 ( )<br />
2. Practical Case:<br />
z( ) → DFT<br />
−1<br />
z () → IDFT<br />
x[n] ^ [ ] Linear<br />
System<br />
y[n] ^ y[ y[n] ]<br />
jω jω<br />
Xe ( ) = Xe ( ) e<br />
jω<br />
{ }<br />
jarg X( e )<br />
z( ( )<br />
Y(z)<br />
^<br />
{ }<br />
jω jω jω<br />
log ⎡<br />
⎣<br />
Xe ( ) ⎤<br />
⎦<br />
= log Xe ( ) + jarg Xe ( )<br />
54
Summary<br />
3. Complex Cepstrum:<br />
π<br />
1<br />
ˆ[<br />
] = ∫<br />
ˆ j j n<br />
xn Xe ( ) e d<br />
2 2π<br />
4. Cepstrum:<br />
π −π<br />
π<br />
ω ω ω<br />
1<br />
[ ] = ∫<br />
j j n<br />
cn log Xe ( ) e d<br />
2π<br />
−π<br />
ω ω ω<br />
xn ˆ[ ] + xˆ[ −n]<br />
cn [ ] = = even part of xn ˆ[<br />
]<br />
2<br />
55
x(n)<br />
Summary<br />
X(k X(k) ^<br />
X(k X(k)<br />
x(n x(n) ^<br />
DFT Complex Log IDFT<br />
~<br />
5. Practical Implementation of Complex Cepstrum:<br />
∞<br />
j N k j N kn<br />
∑<br />
( 2 2ππ / ) − ( 2 2ππ<br />
/ )<br />
Xk [ ] = X Xe ( ) = x [ n ] e )<br />
6. Examples:<br />
n=−∞<br />
{ p } p { p }<br />
Xˆ [ k] = log X ( k) = log X ( k) + jarg X ( k)<br />
N N−<br />
1<br />
∞<br />
1 j 2π<br />
/ N kn<br />
(<br />
xn ˆ[<br />
] = ∑Xke ˆ [ ]<br />
N k= 0<br />
)<br />
= ∑ xn ˆ[<br />
+ rN]<br />
⇒aliasing<br />
r=−∞<br />
xn ˆ[<br />
] = aliased version<br />
of xn ˆ[<br />
]<br />
∞<br />
1<br />
( ) = ⇔ ˆ[<br />
] = ∑ δ [ − ]<br />
1−<br />
r<br />
a<br />
Xz xn n r<br />
−1<br />
az r = 1 r<br />
∞ r<br />
b<br />
Xz ( ) = 1−bz⇔ xn ˆ[<br />
] = − ∑ δ [ n+ r]<br />
r<br />
r = 1<br />
56
Complex Cepstrum Without Phase<br />
Unwrapping<br />
i short-time analysis uses finite-length windowed segments, xn [ ]<br />
i<br />
M<br />
( ) = ∑ [ ]<br />
n=<br />
0<br />
−n<br />
,<br />
thh<br />
-order polynomial<br />
X z x n z M<br />
Find polynomial roots<br />
M M<br />
−1 −1 −1<br />
m m<br />
m= 1 m=<br />
1<br />
i<br />
0<br />
∏ ∏<br />
X( z) = x[0] (1 −a z ) (1 −b<br />
z )<br />
i am<br />
roots are inside unit circle (minimum-phase<br />
part)<br />
i b roots t are outside t id unit it circle i l ( (maximum-phase i h part) t)<br />
i<br />
m<br />
−1 −1<br />
Factor out terms of form − bmz giving:<br />
MiM0 −M 0<br />
−1<br />
X ( z ) = Az A ∏ (1 −a a mz ) ∏ (1 −b<br />
b mz<br />
)<br />
m= 1 m=<br />
1<br />
M0<br />
M0<br />
−1<br />
A= x[0]( −1)<br />
∏bm<br />
mm=<br />
= 1<br />
i Use polynomial root finder<br />
to find the zeros that lie inside<br />
and outside the unit circle and solve directly for xn<br />
ˆ[ ] .<br />
57
Cepstrum for Minimum Phase Signals<br />
• for minimum phase signals (no poles or zeros outside unit circle) the complex cepstrum<br />
can be completely represented by the real part of the Fourier transforms<br />
• this means we can represent the complex<br />
cepstrum of minimum phase signals by the log<br />
of the magnitude of the FT alone<br />
• since the real part of the FT is the FT of the even part of the sequence<br />
ˆ( ) ˆ(<br />
)<br />
Re ⎡ ˆ jω ( ) ⎤<br />
⎡ xn + x−n⎤ Xe = FT<br />
⎣ ⎦ ⎢<br />
2<br />
⎥<br />
⎣ ⎦<br />
jω<br />
FT ⎡ ⎣ ⎣c( n) ⎤ ⎦ = log Xe ( )<br />
xn ˆ( ) + xˆ( −n)<br />
cn ( ) =<br />
2<br />
• giving<br />
xˆ ( n) = 0 n < 0<br />
= cn ( ) n=<br />
0<br />
= 2cn ( ) n><br />
0<br />
•<br />
thus the complex cepstrum (for minimum phase signals) can be computed by computing<br />
the cepstrum and using the equation above<br />
58
Recursive Relation for Complex<br />
C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />
• th the complex l cepstrum t for f minimum i i phase h signals i l<br />
can be computed recursively from the input signal,<br />
xn ( ) using the relation<br />
xn ˆ( ) = 0 n<<br />
0<br />
= log ⎡⎣x( 0) ⎤⎦<br />
n = 0<br />
n−1<br />
xn ( ) ⎛ k ⎞ xn ( − k )<br />
= − ˆ( )<br />
><br />
( 0) ⎜ ⎟xk<br />
n<br />
x ⎝n⎠ x(<br />
0)<br />
∑ k<br />
= 0<br />
0<br />
59
Recursive Relation for Complex<br />
C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />
xn ( ) ←⎯→ Xz ( ) 1. basic z-transform<br />
dX ( z )<br />
nx( n) ←⎯→− z =−zX′<br />
( z) dz<br />
2. scale by n rule<br />
xn ˆ( ) ←⎯→ Xz ˆ ( ) = log Xz ( ) 3. definition of complex cepstrum<br />
[ ]<br />
dXˆ ( z) d X′ ( z)<br />
= [ log[ Xz ( )] ] =<br />
dz dz X( z)<br />
4. differentiation of z-transform<br />
dX dXˆ ( z )<br />
−z Xz ( ) =−zX′<br />
( z)<br />
dz<br />
5. multiply both sides of equation<br />
60
Recursive Relation for Complex<br />
C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />
dXˆ ( z)<br />
nxˆ( ( n ) ∗x ( n ) ←⎯→− z X ( z ) =−zX′ ( z ) ←⎯→ nx ( n )<br />
ddz<br />
∞<br />
∑<br />
( )<br />
nx( n) = xˆ ( k) x( n −k)<br />
k<br />
k =−∞<br />
i for minimum phase systems we have xˆ ( n) = 0for n < 0,<br />
xn ( ) = 0for n<<br />
0,<br />
giving:<br />
n<br />
( ) ˆ<br />
⎛k⎞ xn ( ) = ∑ xkxn ( ) ( − k ) ⎜ ⎟<br />
k = 0<br />
⎝n⎠ i separating out the term for k = n we get:<br />
n−1<br />
⎛k ˆ<br />
⎛k ⎞<br />
xn ( ) = ∑ xkxn ( ) ( − k) ⎜ ⎟+<br />
x( 0)<br />
xn ˆ(<br />
)<br />
k = 0<br />
⎝n⎠ n−1<br />
xn ( ) xn ( − k) ˆ( ) ˆ<br />
⎛k⎞ x( n) = − ∑ x ( k ) , > 0<br />
( 0) ( 0)<br />
⎜ ⎟,<br />
n<br />
x( 0) 0 ( 0)<br />
⎜ ⎟<br />
k = x ⎝ ⎝n ⎠<br />
xˆ( 0) = log x( 0) , xˆ( n) = 0, n<br />
< 0<br />
[ ]<br />
61
Recursive Relation for Complex<br />
C Cepstrum t f for Mi Minimum i Phase Ph Signals Si l<br />
i why is xˆ( 0) = log[ x(<br />
0)]?<br />
i assume we have a finite sequence xn ( ) ), n n= 01 , ,..., N N−<br />
1<br />
i we can write xn ( ) as:<br />
xn ( ) = x( 0) δ( n) + x( 1) δ( n− 1) + ... + xN ( −1) δ(<br />
N−1)<br />
⎡ x () 1 x ( N − 1 ) ⎤<br />
= x( 0) ⎢δ( n) + δ( n− 1) + ... + δ(<br />
N −1)<br />
⎣ x( 0) x(<br />
0)<br />
⎥<br />
⎦<br />
i taking z − transforms, we get:<br />
N−1<br />
∑<br />
−n NZ1 ∏ 1 k<br />
NZ 2<br />
−1<br />
∏ 1 k<br />
n= 0 k= 1 k=<br />
1<br />
Xz ( ) = xnz ( ) = G ( −az) ( −bz)<br />
i where the first term is the gain, G = x(<br />
0),<br />
and the two product terms are the<br />
zeros inside and outside the<br />
unit circle.<br />
i for minimum phase systems we have all zeros inside the unit circle so the<br />
second product term is gone, and we have the result that<br />
xˆ ( 0 ) = log[ g[ G] ] = log[ g[ x ( 0)]; )] xˆ ( n ) = 0, n < 0<br />
NZ1 a<br />
xn ˆ( ) = 0<br />
k=1<br />
n ⎛ ⎞ k − ∑⎜<br />
⎟ n ><br />
⎝ n ⎠<br />
62
Cepstrum p for Maximum Phase Signals g<br />
•<br />
for maximum phase signals (no poles or<br />
zeros iinside id unit it circle) i l )<br />
ˆ( ) + ˆ(<br />
− )<br />
( ) =<br />
2<br />
xn x n<br />
cn<br />
2<br />
• giving<br />
xn ˆ( ( ) = 0 n > 0<br />
= cn ( ) n=<br />
0<br />
= 2cn ( ) n<<br />
0<br />
• thus the complex cepstrum (for maximum<br />
phase signals) can be computed<br />
by computing<br />
the cepstrum and using the equation above<br />
63
•<br />
Recursive Relation for Complex<br />
C Cepstrum t f for M Maximum i Phase Ph Signals Si l<br />
the complex cepstrum for maximum phase signals<br />
can be computed recursively from the input signal, signal<br />
xn ( ) using the relation<br />
xn ˆ( ( ) = 0 n > 0<br />
= log ⎡⎣x( 0) ⎤⎦<br />
n = 0<br />
0<br />
xn ( ) ⎛k⎞ xn ( − k)<br />
= − ∑ ˆ( )<br />
<<br />
( 0) ⎜ ⎟xk<br />
n<br />
x ⎝n⎠ x(<br />
0)<br />
k= n+<br />
1<br />
0<br />
64
Computing Short-Time<br />
Short Time<br />
Cepstrums from Speech<br />
Speech<br />
Using Polynomial Roots<br />
65
Cepstrum From Polynomial Roots<br />
66
Cepstrum From Polynomial Polynomial Roots<br />
Roots<br />
67
Computing Short-Time<br />
Short Time<br />
Cepstrums from Speech<br />
Speech<br />
Using Using g the DFT<br />
68
Practical Considerations<br />
• window to define short-time short time analysis<br />
• window duration (should be several pitch<br />
periods long)<br />
• size of FFT (to minimize aliasing)<br />
• elimination of linear phase components<br />
(positioning signals within frames)<br />
• cutoff quefrency of lifter<br />
• type of lifter (low/high quefrency)<br />
69
Computational Considerations<br />
70
Voiced Speech Example<br />
Hamming window<br />
40 msec duration<br />
(section beginning<br />
at sample 13000<br />
in file<br />
test_16k.wav)<br />
71
Voiced Speech Speech Example<br />
Example<br />
wrapped phase<br />
unwrapped<br />
phase<br />
72
Voiced Speech Speech Example<br />
Example<br />
73
Characteristic System for<br />
H Homomorphic hi Convolution<br />
C l i<br />
• still need to define (and design) the L<br />
operator p p part ( (the linear system y<br />
component) of the system to completely<br />
define the characteristic system y for<br />
homomorphic convolution for speech<br />
– to do this properly p p y and correctly, y, need to look<br />
at the properties of the complex cepstrum for<br />
speech signals<br />
74
Complex Cepstrum Cepstrum of of Speech<br />
Speech<br />
• model of speech:<br />
– voiced speech produced by a quasi-periodic<br />
pulse train exciting slowly time time-varying varying linear<br />
system => p[n] convolved with hv[n] – unvoiced speech produced by random noise<br />
exciting slowly time-varying linear system =><br />
u[n] convolved with hv[n] • time to examine full model and see what<br />
the complex p cepstrum p of speech p<br />
looks like<br />
75
Homomorphic Filtering of Voiced<br />
• goal is to separate out the<br />
excitation impulses from the<br />
remaining components of the<br />
complex cepstrum<br />
• use cepstral window, l(n), to<br />
separate excitation pulses from<br />
combined vocal tract<br />
– this window removes combined<br />
vocal tract<br />
• the filtered signal is processed by<br />
the inverse characteristic system<br />
to recover the combined vocal<br />
tract component<br />
S Speech h<br />
– l(n)=1 for |n|
Voiced Speech Speech Example<br />
Example<br />
Cepstrally<br />
smoothed log<br />
magnitude, 50<br />
quefrencies<br />
cutoff<br />
Cepstrally<br />
unwrapped<br />
phase, 50<br />
quefrencies<br />
cutoff<br />
77
Voiced Speech Speech Example<br />
Example<br />
Combined<br />
impulse<br />
response of<br />
glottal pulse,<br />
vocal tract<br />
system, and<br />
radiation system<br />
78
Voiced Speech Speech Example<br />
Example<br />
High quefrency<br />
liftering; cutoff<br />
quefrency=50;<br />
log magnitude<br />
and unwrapped<br />
phase<br />
79
Voiced Speech Speech Example<br />
Example<br />
Estimated<br />
excitation<br />
function for<br />
voiced speech<br />
(Hamming<br />
window<br />
weighted) i ht d)<br />
80
Unvoiced Speech Speech Example<br />
Example<br />
Hamming window<br />
40 msec duration<br />
(section beginning<br />
at sample 3200 in<br />
file test test_16k.wav) 16k.wav)<br />
81
Unvoiced Speech Speech Example<br />
Example<br />
wrapped phase<br />
unwrapped<br />
phase<br />
82
Unvoiced Speech Speech Example<br />
Example<br />
83
Unvoiced Speech Speech Example<br />
Example<br />
Cepstrally<br />
smoothed log<br />
magnitude, 50<br />
quefrencies<br />
cutoff<br />
Cepstrally<br />
unwrapped<br />
phase, 50<br />
quefrencies<br />
cutoff<br />
84
Unvoiced Speech Speech Example<br />
Example<br />
Estimated<br />
excitation source<br />
for unvoiced<br />
speech section<br />
(Hamming<br />
window<br />
weighted)<br />
85
Short Short-Time Time Homomorphic Analysis<br />
STFT<br />
86
Review Review of Cepstral Calculation<br />
• 3 potential methods for computing cepstral<br />
coefficients, xn ˆ[ ] , of sequence x[n]<br />
– analytical method; assuming X(z) is a rational<br />
function; find poles and zeros and expand using log<br />
power series<br />
– recursion method; assuming X(z) is either a minimum<br />
phase (all poles and zeros inside unit circle) or<br />
maximum phase (all poles and zeros outside unit<br />
circle) sequence<br />
– DFT implementation; using windows, with phase<br />
unwrapping (for complex cepstra)<br />
87
Example 11—single<br />
single pole sequence<br />
( (computed t d using i all ll 3 3 methods) th d )<br />
[ ] = [ ]<br />
n<br />
xn aun<br />
n<br />
a<br />
xn ˆ[ [ ] = un [ − 1 ]<br />
n<br />
88
Cepstral Computation Aliasing<br />
i<br />
i<br />
i<br />
i<br />
Effect of quefrency aliasing via a simple example<br />
xn [ ] = δδ [ n ] + αδ αδ [ n n− N ]<br />
with discrete-time Fourier transform<br />
( j jω<br />
) 1<br />
j jω N<br />
α −<br />
X e e ω<br />
= +<br />
p<br />
We can express the complex logarithm as<br />
p<br />
ˆ j<br />
j Np<br />
X( e ) log{1 e }<br />
ω<br />
ω<br />
α −<br />
∞ m+ 1 m ⎛( −1)<br />
α ⎞ − jωmNp = + = ∑⎜<br />
⎟e<br />
m=<br />
1⎝<br />
m ⎠<br />
giving a complex cepstrum in the form<br />
( −1)<br />
α<br />
xn ˆ[ [ ] = ∑<br />
⎜ ⎟ ⎟δδ<br />
[ n n− mN mNp<br />
]<br />
m=<br />
1⎝<br />
m ⎠<br />
∞ m+ 1 m ⎛ ⎞<br />
89
Example Example 22—voiced<br />
2 voiced voiced speech frame<br />
90
Example 33—low<br />
low quefrency liftering<br />
91
Example 33—high<br />
high quefrency liftering<br />
92
Example 4—effects 4 effects of low quefrency lifter<br />
93
Example 55—phase<br />
phase unwrapping<br />
94
Example 66—phase<br />
phase unwrapping<br />
95
Homomorphic Spectrum Spectrum Smoothing<br />
Smoothing<br />
96
Running Cepstrum<br />
97
Running Cepstrum<br />
98
Running Cepstrums<br />
99
Cepstrum Applications<br />
100
Cepstrum Distance Distance Measures<br />
Measures<br />
i The cepstrum forms a natural basis for comparing<br />
patterns in speech recognition or vector quantization<br />
because of its stable mathematical characterization<br />
for speech signals<br />
i A typical "cepstral distance<br />
measure" is of the form:<br />
nco<br />
∑<br />
D= ([ c n] −c[<br />
n])<br />
n=<br />
1<br />
2<br />
where cn [ ] and cn [ ] are cepstral p sequences q corresponding p g<br />
to frames of signal, and D is the cepstral distance between<br />
the pair of sequences.<br />
i Using Parseval's Parseval s theorem, we can express the cepstral<br />
distance in the frequency domain as:<br />
1<br />
π<br />
jω jω<br />
2<br />
D (log | H( e ) | log | H( e ) |) d<br />
2π<br />
π −<br />
= ∫<br />
−<br />
2π<br />
i Thus we see that the cepstral distance is actually a log<br />
magnitude spectral<br />
distance<br />
ω<br />
101
Mel Frequency Cepstral Coefficients<br />
i Basic idea is to compute a frequency analysis based on a filter<br />
bank with approximately critical band spacing of the filters and<br />
bandwidths. For 4 kHz bandwidth, , approximately pp y 20 filters are used.<br />
i First perform a short-time Fourier analysis, giving Xm[ k], k = 0,1,..., NF / 2<br />
where m is the frame number and k is the frequency index (1 to half<br />
thesizeoftheFFT)<br />
the size of the FFT)<br />
i Next the DFT values are grouped together in critical bands and weighted<br />
by triangular weighting functions.<br />
102
Mel Frequency Cepstral Coefficients<br />
th th<br />
i The mel-spectrum of the m frame for the r filter ( r = 1,2,..., R)<br />
is defined as:<br />
MF<br />
U r 1<br />
[] r = ∑ | V [] k X []| k<br />
A<br />
m r m<br />
r k= L<br />
r<br />
2<br />
th<br />
where Vr[ k] is the weighting function for the r filter, ranging from<br />
DFT index L to U , and<br />
U<br />
r r<br />
r<br />
Ar = ∑ | Vr[ k]|<br />
k k= L<br />
r<br />
2<br />
th<br />
is the normalizing factor for the r mel-filter. (Normalization guarantees<br />
that if the input spectrum is flat, the mel-spectrum is flat).<br />
i A discrete cosine transform<br />
of the log magnitude of the filter outputs is<br />
computed to form the function mfcc [ n]<br />
as:<br />
R 1 ⎡2π⎛ 1⎞<br />
⎤<br />
mfcc mfccm [ n ] = ∑ log( MF MFm<br />
[ r ])cos ⎢ ⎜ r r+ ⎟ n n⎥ ,<br />
R<br />
⎢ ⎜ + ⎟<br />
r 1<br />
R 2<br />
⎥<br />
=<br />
⎣ ⎝ ⎠ ⎦<br />
n n= 1,2, 1,2,..., , N Nmfcc<br />
f<br />
i Typically N = 13 and R=<br />
24 for 4 kHz bandwidth speech signals.<br />
mfcc<br />
103
Delta Cepstrum<br />
i The set of mel frequency cepstral coefficients provide perceptually<br />
meaningful and smooth estimates of speech spectra, over time<br />
i Since speech is inherently a dynamic signal, it is reasonable to seek<br />
a representation t ti that th t includes i l d some aspect t of f the th dynamic d i nature t off<br />
the time derivatives (both first and second order derivatives) of the shortterm<br />
cepstrum<br />
i The resulting parameter sets are called th the e delta cepstrum (first derivative)<br />
and the delta-delta cepstrum (second derivative).<br />
i The simplest method of computing delta cepstrum parameters is a first<br />
difference of cepstral vectors, of the form:<br />
Δ mfccm[ n] = mfccm[ n] −mfccm−1[<br />
n]<br />
i The simple difference is a poor approximation to the first derivative and is<br />
not generally used. Instead a least-squares approximation to the local slope<br />
(over a region<br />
around the current sample) is used, and is of the form:<br />
Δ mfcc [ n]<br />
=<br />
M<br />
∑<br />
k=−M m M<br />
k( mfcc [ n])<br />
∑<br />
k=−M k<br />
m+ k<br />
2<br />
where the region is M frames before and after the current frame<br />
104
Homomorphic Homomorphic Vocoder<br />
Vocoder<br />
• time-dependent p complex p cepstrum p retains all the<br />
information of the time-dependent Fourier<br />
transform => exact representation of speech<br />
• time i ddependent d real l cepstrum lloses phase h<br />
information -> not an exact representation of<br />
speech<br />
• quantization of cepstral parameters also loses<br />
information<br />
• cepstrum gives good estimates of pitch, voicing,<br />
formants => can build homomorphic vocoder<br />
105
Homomorphic Homomorphic Vocoder<br />
Vocoder<br />
1. compute p cepstrum p every y 10-20 msec<br />
2. estimate pitch period and<br />
voiced/unvoiced decision<br />
3. quantize and encode low-time cepstral<br />
values<br />
44. at t synthesizer-get th i t approximation i ti to t h hv(n) ( )<br />
or hu(n) from low time quantized cepstral<br />
values<br />
5. convolve hv(n) or hu(n) with excitation<br />
created from pitch, p , voiced/unvoiced, , and<br />
amplitude information<br />
106
Homomorphic Vocoder<br />
• l( l(n) ) iis cepstrum t window i d th that t selects l t llow-time ti<br />
values and is of length 26 samples homomorphic<br />
vocoder<br />
107
Homomorphic Vocoder Impulse Responses<br />
108
Summary<br />
• Introduced the concept of the cepstrum of a signal,<br />
defined as the inverse Fourier transform of the log of the<br />
signal spectrum<br />
1<br />
j<br />
ˆ[ ] log ( )<br />
xn F X e ω<br />
− = ⎡<br />
⎣<br />
⎤<br />
⎦<br />
• Showed cepstrum reflected properties of both the<br />
excitation (high quefrency) and the vocal tract (low<br />
quefrency) f )<br />
– short quefrency window filters out excitation; long quefrency<br />
window filters out vocal tract<br />
• Mel-scale cepstral coefficients used as feature set for<br />
speech recognition<br />
• Delta and delta delta-delta delta cepstral coefficients used as<br />
indicators of spectral change over time 109