Time Domain Methods in Speech Processing General Synthesis ...

More documents

Recommendations

Info

Frame-by-Frame Processing in Successive Windows • Speech is processed frame-by-frame in overlapping intervals until entire region of speech is covered by at least one such frame • Results of analysis of individual frames used to derive model parameters in some manner • Representation goes from time sample x[ n], n = , 0, 1, 2, to parameter vector f[ nˆ], n= ˆ 0,1,2, where n is the time index and ˆn is the frame index. Generic Short-Time Processing x[n] T(x[n]) T( ) w[n] Q ∞ ⎛ ⎞ Qnˆ= ⎜ ∑ T([ x m]) w[ n−m] ⎟ ⎝ ⎠ linear or non-linear transformation m=−∞ n= nˆ window sequence (usually finite length) Qˆn • ˆn is a sequence of local weighted average values of the sequence T(x[n]) at time n= nˆ Computation of Short-Time Energy • window jumps/slides across sequence of squared values, selecting interval for processing • what happens to Eˆn as sequence jumps by 2,4,8,...,L samples ( E ˆn is a lowpass function—so it can be decimated without lost of information; why is E ˆn lowpass?) • effects of decimation depend on L; if L is small, then E ˆn is a lot more variable than if L is large (window bandwidth changes with L!) 11 7 9 Short-Time Signal Processing Short-Time Parameter s[ n] Analysis Qˆn Estimation f [ nˆ ] speech waveform alternate representation model parameter(s) Model Parameter(s): • speech/non-speech • voiced/unvoiced/background • pitch period (when voiced) • formants Short-Time Energy ∞ 2 E = ∑ x m m=−∞ [ ] -- this is the long term definition of signal energy -- there is little or no utility of this definition for time-varying signals 2 2 [ ] = x [ nˆ − L+ 1] + ... + x [ nˆ] Enˆ= ∑ m= − + 1 2 x m ˆ n nˆ L -- short-time energy in vicinity of time nˆ T( x) = x 2 wn [ ] = 1 0≤n≤L−1 = 0 otherwise Effects of Window Q = T( x[ n]) ∗w[ n] = x′ [ n] ∗w[ n] nˆ n= nˆ n= nˆ • w[n] serves as a lowpass filter on T(x[n]) which often has a lot of high frequencies (most non-linearities introduce significant high frequency energy—think of what (x[n] x[n]) does in frequency) • often we extend the definition of Qˆn to include a pre-filtering term so that x[n] itself is filtered to a region of interest xˆ[ n] Linear x[n] T(x[n]) Lowpass ˆn T( ) Filter Filter, w[n] Q 8 10 12 2
Short-Time Energy • serves to differentiate voiced and unvoiced sounds in speech from silence (background signal) • natural definition of energy of weighted signal is: ∞ ∑ ∞ nˆ = ∑ 2 2 ˆ − ∞ = ∑ 2 ˆ − m=−∞ 2 m=−∞ 2 Enˆ= ⎡⎣x[ m] w[ nˆ−m] ⎤⎦ (sum of squares of portion of signal) m=−∞ -- concentrates measurement at sample nˆ, using weighting w[ n-m ˆ ] E x [ m] w [ n m] x [ m] h[ n m] hn [ ] = w [ n] x[n] x2 [n] Enˆ − short-time energy ( ) 2 h[n] Windows • consider two windows, w[n] – rectangular window: • h[n]=1, 0≤n≤L-1 and 0 otherwise – Hamming window (raised cosine window): • h[n]=0.54-0.46 cos(2πn/(L-1)), 0≤n≤L-1 and 0 otherwise – rectangular window gives equal weight to all L samples in the window (n,...,n-L+1) – Hamming window gives most weight to middle samples and tapers off strongly at the beginning and the end of the window Window Frequency Responses • rectangular window sin( ΩLT / 2) He ( ) = e sin( ΩT / 2) jΩT −jΩT( L−1)/ 2 • first zero occurs at f=Fs/L=1/(LT) (or Ω=(2π)/(LT)) => nominal cutoff frequency of the equivalent “lowpass” filter • Hamming window wH[ n] = 0.54 wR[ n] −0.46*cos(2 π n/ ( L−1)) wR[ n] • can decompose Hamming Window FR into combination of three terms 13 15 17 Short-Time Energy Properties • depends on choice of h[n], or equivalently, window w[n] –if w[n] duration very long and constant amplitude (w[n]=1, n=0,1,...,L-1), E ˆn would not change much over time, and would not reflect the short-time amplitudes of the sounds of the speech – very long duration windows correspond to narrowband lowpass filters – want E ˆn to change at a rate comparable to the changing sounds of the speech => this is the essential conflict in all speech processing, namely we need short duration window to be responsive to rapid sound changes, but short windows will not provide sufficient averaging to give smooth and reliable energy function Rectangular and Hamming Windows Time Responses of L=21 point Rectangular and Hamming windows; Frequency Responses of L=51 point Rectangular and Hamming Windows Window Frequency Responses Rectangular Windows, L=21,41,61,81,101 Hamming Windows, L=21,41,61,81,101 14 16 18 3
Page 1: Digital Signal Processing Design
Page 5 and 6: Recursive Short-Time Energy x [n] (
Page 7 and 8: Zero Crossings for Noise 3 2 1 0 -1
Page 9 and 10: Short-Time Autocorrelation ∞ ∑
Page 11: Examples of Modified AC L=401 L=401

Time Domain Methods in Speech Processing General Synthesis ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?