18.07.2013 Views

Time Domain Methods in Speech Processing General Synthesis ...

Time Domain Methods in Speech Processing General Synthesis ...

Time Domain Methods in Speech Processing General Synthesis ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Frame-by-Frame Process<strong>in</strong>g <strong>in</strong> Successive W<strong>in</strong>dows<br />

• <strong>Speech</strong> is processed frame-by-frame <strong>in</strong> overlapp<strong>in</strong>g <strong>in</strong>tervals until entire<br />

region of speech is covered by at least one such frame<br />

• Results of analysis of <strong>in</strong>dividual frames used to derive model parameters <strong>in</strong><br />

some manner<br />

• Representation goes from time sample x[<br />

n],<br />

n = , 0,<br />

1,<br />

2,<br />

to parameter<br />

vector f[ nˆ], n= ˆ 0,1,2, where n is the time <strong>in</strong>dex and ˆn is the frame <strong>in</strong>dex.<br />

Generic Short-<strong>Time</strong> Process<strong>in</strong>g<br />

x[n] T(x[n])<br />

T( ) w[n]<br />

Q<br />

∞ ⎛ ⎞<br />

Qnˆ= ⎜ ∑ T([ x m]) w[ n−m] ⎟<br />

⎝ ⎠<br />

l<strong>in</strong>ear or non-l<strong>in</strong>ear<br />

transformation<br />

m=−∞ n= nˆ<br />

w<strong>in</strong>dow sequence<br />

(usually f<strong>in</strong>ite length)<br />

Qˆn<br />

• ˆn is a sequence of local weighted average<br />

values of the sequence T(x[n]) at time n= nˆ<br />

Computation of Short-<strong>Time</strong> Energy<br />

• w<strong>in</strong>dow jumps/slides across sequence of squared values, select<strong>in</strong>g <strong>in</strong>terval<br />

for process<strong>in</strong>g<br />

• what happens to Eˆn<br />

as sequence jumps by 2,4,8,...,L samples ( E ˆn is a lowpass<br />

function—so it can be decimated without lost of <strong>in</strong>formation; why is E ˆn lowpass?)<br />

• effects of decimation depend on L; if L is small, then E ˆn is a lot more variable<br />

than if L is large (w<strong>in</strong>dow bandwidth changes with L!)<br />

11<br />

7<br />

9<br />

Short-<strong>Time</strong> Signal Process<strong>in</strong>g<br />

Short-<strong>Time</strong> Parameter<br />

s[ n] Analysis Qˆn<br />

Estimation f [ nˆ<br />

]<br />

speech<br />

waveform<br />

alternate<br />

representation<br />

model<br />

parameter(s)<br />

Model Parameter(s):<br />

• speech/non-speech<br />

• voiced/unvoiced/background<br />

• pitch period (when voiced)<br />

• formants<br />

Short-<strong>Time</strong> Energy<br />

∞<br />

2<br />

E = ∑ x m<br />

m=−∞<br />

[ ]<br />

-- this is the long term def<strong>in</strong>ition of signal energy<br />

-- there is little or no utility of this def<strong>in</strong>ition for time-vary<strong>in</strong>g signals<br />

2 2<br />

[ ] = x [ nˆ − L+ 1]<br />

+ ... + x [ nˆ]<br />

Enˆ= ∑<br />

m=<br />

− + 1<br />

2<br />

x m<br />

ˆ n<br />

nˆ L<br />

-- short-time energy <strong>in</strong> vic<strong>in</strong>ity of time nˆ<br />

T( x) = x<br />

2<br />

wn [ ] = 1 0≤n≤L−1 = 0 otherwise<br />

Effects of W<strong>in</strong>dow<br />

Q = T( x[ n]) ∗w[<br />

n]<br />

= x′ [ n] ∗w[<br />

n]<br />

nˆ n= nˆ<br />

n= nˆ<br />

• w[n] serves as a lowpass filter on T(x[n]) which often has a lot of<br />

high frequencies (most non-l<strong>in</strong>earities <strong>in</strong>troduce significant high<br />

frequency energy—th<strong>in</strong>k of what (x[n] x[n]) does <strong>in</strong> frequency)<br />

• often we extend the def<strong>in</strong>ition of Qˆn<br />

to <strong>in</strong>clude a pre-filter<strong>in</strong>g term<br />

so that x[n] itself is filtered to a region of <strong>in</strong>terest<br />

xˆ[ n] L<strong>in</strong>ear x[n]<br />

T(x[n])<br />

Lowpass ˆn<br />

T( )<br />

Filter<br />

Filter, w[n]<br />

Q<br />

8<br />

10<br />

12<br />

2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!