Time Domain Methods in Speech Processing

Digital Signal Processing 

Design — Lecture 5 

Time Domain Methods 

in Speech Processing 

1

General Synthesis Synthesis Model 

T T 

T 1 

T 2 

voiced sound 

amplitude 

unvoiced sound 

amplitude 

Log Areas, Reflection 

Coefficients, Formants, Vocal 

T Tract Polynomial, P l i l Articulatory 

A i l 

Parameters, … 

Rz ( ) = 1− 

z 

1 

α − 

Pitch Detection, Voiced/Unvoiced/Silence Detection, Gain Estimation, Vocal Tract 

Parameter Estimation, Glottal Pulse Shape, Radiation Model 

2

Fundamental Fundamental Assumptions 

Assumptions 

• properties of the speech signal change relatively 

slowly l l with ith ti time (5 (5-10 10 sounds d per second) d) 

– over very short (5-20 msec) intervals => uncertainty 

due to small amount of data, , varying y g p pitch, , varying y g 

amplitude 

– over medium length (20-100 msec) intervals => 

uncertainty due to changes in sound quality quality, 

transitions between sounds, rapid transients in 

speech 

– over long (100 (100-500 500 msec) intervals => uncertainty 

due to large amount of sound changes 

• there is always y uncertaintyy in short time 

measurements and estimates from speech 

signals 

3

Compromise Compromise Solution 

• “short-time” processing p g methods => short segments g of 

the speech signal are “isolated” and “processed” as if 

they were short segments from a “sustained” sound with 

fixed (non-time-varying) ( y g) pproperties p 

– this short-time processing is periodically repeated for the 

duration of the waveform 

– these short analysis segments, or “analysis frames” often 

overlap one another 

– the results of short-time processing can be a single number (e.g., 

an estimate of the pitch period within the frame), or a set of 

numbers (an estimate of the formant frequencies for the analysis 

frame) 

– the end result of the processing is a new, time-varying sequence 

that serves as a new representation of the speech signal 

4

Frame Frame-by by-Frame Frame Processing 

in Successive Windows 

Frame 1 

Frame 2 

Frame 3 

Frame 4 

Frame 5 

75% frame overlap => frame length=L, frame shift=R=L/4 

Frame1={x[0],x[1],…,x[L-1]} 

Frame2={x[R] Frame2={x[R],x[R+1],…,x[R+L-1]} 

x[R+1] x[R+L-1]} 

Frame3={x[2R],x[2R+1],…,x[2R+L-1]} 

… 

5

Frame 1: samples 0,1,..., L −1 

Frame 2: samples RR , + 1,..., 1 R + L L−1 

1 

Frame 3: samples 2 R, 2R+ 1,..., 2R+ L−1 

Frame 4: samples 3 R,3R+ 1,...,3R+ L−1

Frame Frame-by by-Frame Frame Processing in Successive Windows 

• Speech is processed frame-by-frame in overlapping intervals until entire 

region of speech is covered by at least one such frame 

• Results of analysis of individual frames used to derive model parameters in 

some manner 

• Representation goes from time sample x[ 

n], 

n = , 0, 

1, 

2, 

to parameter 

vector f[ nˆ], n= ˆ 0,1,2, where n is the time index and ˆn 

is the frame index. 

7

Short Short-Time Short Time Signal Processing 

Short Short-Time Time Parameter 

sn [ ] Analysis Q Estimation f [ nˆ 

] 

speech 

waveform 

Model Parameter(s): 

• speech/non-speech 

• voiced/unvoiced/background 

• pitch period (when voiced) 

• formants 

ˆn 

alternate 

model 

representation 

p parameter(s) 

p ( ) 

8

Generic Short Short-Time Short Time Time Processing 

Processing 

∞ ⎛ ⎞ 

Q ∑ 

nˆ 

= ⎜ ∑ T ( x[ m]) ) w[ n−m] ⎟ 

⎝ ⎠ 

m=−∞ n= nˆ 

x[n] T(x[n]) 

T( ) w[n] 

Qˆn 

linear or non-linear 

transformation 

window sequence 

(usually ( y finite length) g ) 

Qˆn 

• is a sequence of local weighted g average g 

values of the sequence T(x[n]) at time ˆ 

n= n 

9

Short Short-Time Short Time Energy 

∞ 

∑ 

m=−∞ 

2 

[ ] 

E = ∑ x m 

-- this is the long term definition of signal energy 

-- there is little or no utility of this definition for time-varying signals 

Enˆ= ∑ 

2 

x m 

ˆ n 

m= nˆ− L+ 

1 

[ ] 

-- short-time energy in vicinity of time nˆ 

2 

T ( x ) = x 

wn [ ] = 1 0≤n≤L−1 = 0 otherwise 

2 2 

= x [ nˆ − L+ 1] 

+ ... + x [ nˆ] 

10

Computation of Short Short-Time Time Energy 

• window jumps/slides across sequence of squared values, selecting interval 

ffor processing i 

E 

• what happens to ˆn as sequence jumps by 2,4,8,...,L samples ( ˆn is a lowpass 

function—so it can be decimated without lost of information; why is lowpass?) 

• effects of decimation depend on L; if L is small, then ˆn is a lot more variable 

than if L is large (window bandwidth changes with L!) 

E 

E 

E 

ˆn 

11

Effects Effects of of Window 

Window 

Q = T( x[ n]) ∗w[ 

n] 

nˆ n= nˆ 

= x′ n ∗w n ˆ 

[ ] [ ] 

n= n 

• w[n] serves as a lowpass filter on T(x[n]) which often has a lot of 

high frequencies (most non-linearities introduce significant high 

frequency energy—think of what (x[n] x[n]) does in frequency) 

•often ft we extend t d the th definition d fi iti of f Q Qˆn 

tto include i l d a pre-filtering filt i tterm 

so that x[n] itself is filtered to a region of interest 

xˆ[ n] x[n] 

T(x[n]) 

ˆn 

T( ) 

Linear 

Filter 

Lowpass 

Filter, w[n] 

Q 

12

Short Short-Time Short Time Energy 

• serves to differentiate voiced and unvoiced sounds in speech 

from silence (background signal) 

• natural definition of energy of weighted signal is: 

∞ 

∑ 

2 

E ˆ 

nˆ 

= ⎡⎣x[ m] w[ n−m] ⎤⎦ 

(sum of squares of portion of signal) 

m=−∞ 

-- concentrates measurement at sample n nˆ, using weighting w [ n- nˆ m 

] 

nˆ 

∞ 

= ∑ 

2 2 ˆ − 

∞ 

= ∑ 

2 ˆ − 

m=−∞ 

2 

m=−∞ 

E x [ m] w [ n m] x [ m] h[ n m] 

h [ n ] = w [ n ] 

x[n] x2 x[n] x [n] E ˆ − short-time energy gy 

( ) 2 h[n] 

n 

13

Short Short-Time Short Time Time Energy Properties 

• depends on choice of h[n], or equivalently, 

window i d w[n] [ ] 

–if w[n] duration very long and constant amplitude 

(w[n]=1, ( [ ] , n=0,1,...,L-1), , , , ), Enˆn 

would not change g much over 

time, and would not reflect the short-time amplitudes of 

the sounds of the speech 

– very long duration windows correspond to narrowband 

lowpass filters 

– want E ˆn to change at a rate comparable to the changing 

sounds of the speech => this is the essential conflict in 

all speech processing, namely we need short duration 

window to be responsive to rapid sound changes, but 

short windows will not provide sufficient averaging to 

give smooth and reliable energy function 

14

Windows 

• consider two windows, w[n] 

– rectangular window: 

• h[n]=1, 0≤n≤L-1 and 0 otherwise 

– Hamming window (raised cosine window): 

• h[n]=0.54-0.46 cos(2πn/(L-1)), 0≤n≤L-1 and 0 otherwise 

– rectangular window gives equal weight to all L 

samples in the window (n,...,n-L+1) 

– Hamming g window ggives most weight g to middle 

samples and tapers off strongly at the beginning and 

the end of the window 

15

Rectangular and Hamming Windows 

Time Responses of L=21 L 21 point Rectangular and 

Hamming windows; Frequency Responses of L=51 

point Rectangular and Hamming Windows 

16

Window Window Frequency Frequency Responses 

Responses 

• rectangular g window 

sin( ΩLT 

/ 2) 

He ( ) = 

e 

sin( Ω T / 2 ) 

jΩT −jΩT( L−1)/ 

2 

• first zero occurs at f=Fs/L=1/(LT) (or Ω=(2π)/(LT)) => 

nominal cutoff frequency of the equivalent “lowpass” lowpass filter 

• Hamming window 

w [ n ] = 0.54 w [ n ] −0.46*cos(2 ( π n/( ( L−1)) )) w [ n ] 

H R R 

• can decompose Hamming Window FR into combination 

of three terms 

17

Window Frequency Responses 

Rectangular Windows, 

L=21,41,61,81,101 

Hamming Windows, 

L=21,41,61,81,101 

18

RW and HW Frequency Responses 

•bandwidth of HW is approximately twice the bandwidth of RW 

• attenuation of more than 40 dB for HW outside passband, 

versus 14 dB ffor RW 

• stopband attenuation is essentially independent of L, the 

window duration => increasing L simply decreases window 

bandwidth 

• L needs to be larger than a pitch period (or severe 

fluctuations will occur in E En), ) but smaller than a sound duration 

(or En will not adequately reflect the changes in the speech 

signal) 

There is no perfect value of L, since a pitch period can be as short as 20 samples (500 Hz at a 10 kHz 

sampling rate) for a high pitch child or female, and up to 250 samples (40 Hz pitch at a 10 kHz sampling 

rate) for a low pitch male; a compromise value of L on the order of 100-200 samples for a 10 kHz sampling 

rate is often used in practice 

19

Short Short-Time Time Energy using RW/HW 

E ˆn 

ˆn E 

L=51 L=51 

L=101 L=101 

L=201 L=201 

L=401 L=401 

•as L increases, the plots tend to converge (however you are smoothing sound energies) 

• short-time energy provides the basis for distinguishing voiced from unvoiced speech 

regions, and for medium-to-high SNR recordings, can even be used to find regions of 

silence/background signal 

20

Short Short-Time Time Energy gy for AGC 

C Can use an IIR filt filter t to d define fi short short-time h t ti time energy, e.g., 

• 

• 

time-dependent energy definition 

∞ ∞ 

2 2 

∑ ∑ 

σ [ n] = x [ m] h[ n−m]/ h[ m] 

m=−∞ m= 

0 

consider impulse response of filter of form 

n−1 n−1 

= − = ≥ 

hn [ ] α un [ 1] α n 1 

= 0 n < 1 

∞ 

2 

∑ 

m=−∞ 

2 n−m−1 σ [ n] = ∑ ( 1−α) x [ m] α u[ n−m−1] 21

Recursive Short Short-Time Short Time Time Energy 

i un [ −m−1] implies the condition n−m−1≥ 0 

or m ≤ n− 

1 giving i i 

i 

i 

i 

2 

−1 

∑ 

=−∞ 

2 − −1 

2 2 

n 

m ∞ 

n m 

σ [ n] = ( 1− α) x [ m] α = ( 1−α)( x [ n− 1] + αx 

[ n− 

2] 

+ ...) 

for the index n −1 

we have 

n−2 

2 2 n−m−2 2 2 

∑ 

σ [ n− 1 ] = ( 1− α ) x [ m ] α = ( 1−α )( x [ n− 2 ] + αx 

[ n− 

3 ] + ...) ) 

m=−∞ 

thus giving the relationship 

2 2 2 

σσ [ n ] = αα ⋅ ⋅σσ [ n − 1 ] + x [ n − 1 ]( 1 − −αα 

) 

and defines an Automatic Gain Control (AGC) of the form 

Gn [ ] = 

G 

0 

σ [ n] 

22


2 n 

x [n] 

2 x [ ] 

σ [ ] 

( ) 

1 

z + 

− 

( ) z + 

( 1− 

α) 

α 

2 2 2 

σ [ n] = α ⋅σ [ n− 1] + x [ n−1]( 

1−α) 

1 

z − 

2 n 

23


24

Use of Short Short-Time Time Energy gy for AGC 

25

Use of Short Short-Time Time Energy for AGC 

α = 

α = 

0.9 

0.99 

26

Short Short-Time Short Time Magnitude 

• short-time short time energy is very sensitive to large 

signal levels due to x2 [n] terms 

– consider a new definition of ‘pseudo-energy’ based 

on average signal magnitude (rather than energy) 

∞ 

M ˆ | [ ]| [ ˆ 

n = ∑ n xm wn−m] ∑ 

m=−∞ 

– weighted sum of magnitudes, rather than weighted 

sum of squares 

x[n] |x[n]| Mnˆ= M n n= nˆ 

| | w[n] 

• computation avoids multiplications of signal with itself (the squared term) 

27

Short Short-Time Time Magnitudes 

g 

• differences between E n and M n noticeable in unvoiced regions 

M ˆn 

ˆn M 

L=51 L=51 

L=101 L=101 

L=201 L=201 

L=401 L=401 

• dynamic range of M Mn ~ square root (dynamic range of E En) ) => level differences between voiced and 

unvoiced segments are smaller 

• E n and M n can be sampled at a rate of 100/sec for window durations of 20 msec or so => efficient 

representation of signal energy/magnitude 

28

Short Time Energy and Magnitude Magnitude— 

Rectangular R RRectangular t l Window Wi d 

E ˆn 

L=51 L=51 

M 

L=101 L=101 

L=201 L=201 

L=401 L=401 

ˆn 

29

Short Time Energy and Magnitude Magnitude—Hamming Hamming 

Wi Window d 

E ˆn 

L=51 

L=101 

L=201 

L=401 

M 

ˆn 

L=51 

L=101 

L=201 

L=401 

30

Short Short-Time Short Time Time Average ZC Rate 

zero crossings 

zero crossing => successive samples 

have different algebraic signs 

• zero crossing rate is a simple measure of the ‘frequency content’ of a 

signal—especially true for narrowband signals (e.g., sinusoids) 

• sinusoid at frequency F0 with sampling rate FS has FS/F0 samples per 

cycle with two zero crossings per cycle, giving an average zero 

crossing g rate of 

z 1=(2) crossings/cycle x (F 0 / F S ) cycles/sample 

z z1=2F =2F 0 /F / FS crossings/sample (i (i.e., e z z1 proportional to F F0 ) 

z M=M (2F 0 /F S ) crossings/(M samples) 

31

Sinusoid Sinusoid Zero Zero Crossing Crossing Rates 

Rates 

Ass Assume me the sampling rate is F FS 

= 10 10, 000 HHz 

1. F0 = 100 Hz sinusoid has FS/ F0 

= 10, 000 / 100 = 100 samples/cycle; 

or z1 = 2 / 100 crossings/sample, or z100 

= 2 / 100 * 100 = 

2 crossings/10 msec interval 

2. F 0 = 1000 Hz sinusoid 

has F S / F 0 = 10, 000 / 1000 = 10 samples/cycle; p y 

or z1 = 2 / 10 crossings/sample, or z100 

= 2 / 10 * 100 = 


3. F0 = 5000 Hz sinusoid has FS/ F0 

= 10, 000 / 5000 = 2 samples/cycle; 

or z = 2/ 2 crossings/sample, 

or z100 

= 2 / 2 * 100 = 

1 


32

Zero Crossing g for Sinusoids 

1 

0.5 

0 

-0.5 

offset:0.75, 100 Hz sinewave, ZC:9, offset sinewave, ZC:8 

-1 

0 50 100 150 200 

1.5 

1 

0.5 

0 

Off Offset=0.75 0 

0 50 100 150 200 

100 Hz sinewave 

100 Hz sinewave with dc offset 

ZC=9 

ZC=8 

ZC 8 

33

Zero Crossings g for Noise 

3 

2 

1 

0 

-1 

-2 

6 

4 

0 

offseet:0.75, random noise, ZC:252, offset noise, ZC:122 

0 50 100 150 200 

Off Offset=0.75 0 

random gaussian noise 

random gaussian noise with dc offset 

ZC=252 

2 ZC=122 

-2 

0 50 100 150 200 250 

34

ZC Rate Definitions 

1 

Znˆ= 2Leff 

∞ 

∑ | sgn( x[ m]) −sgn( x[ m−1]) | w[ nˆ−m] m=−∞ 

sgn( ( x [ n ]) = 1 x [ n ] ≥ 0 

=− 1 xn [ ] < 0 

i simple window, rectangular with 

wn [ ] = 1 0≤n≤L−1 = 0 otherwise 

L = L 

i eff ff 

same form for Z n as for E n or M n 

35

ZC Normalization 

i Th The fformal l ddefinition fi iti of f z iis: 

i 

nˆ 

1 

Znˆ= z1= ∑ | sgn( x[ m]) −sgn( x[ m−1]) 

| 

L 

2 m= nˆ− L+ 

1 

n 

is interpreted as the number of zero crossings per sample. 

For most practical applications, we need the rate of zero crossings 

per fixed interval of M samples, samples which is 

z = z ⋅ M = rate of zero crossings per M sample interval 

M 

1 

Thus, for an interval of τ sec., corresponding to M samples we get 

z = z ⋅ M; M = τ F = τ / T 

M 1 

S 

36

ZC Normalization 

i For a 1000 Hz sinewave as input, using a 40 msec window length 

i 

( L), with various values of sampling rate ( FS), 

we get the following: 

F L z M z 

S 1 

M 

8000 320 1/ 4 80 20 

10000 400 1 / 5 100 20 

16000 640 1 / 8 160 20 

Thus us we e see tthat at tthe e normalized o a ed (pe (per interval) te a ) zero e o ccrossing oss g rate, ate, 

zM, 

is independent of the sampling rate and can be used as a measure 

of the dominant energy in a band. 

37

ZC Rate Distributions 

1 KHz 2KHz 3KHz 4KHz 

• for voiced speech, energy is mainly below 1.5 kHz 

• for unvoiced speech speech, energy is mainly above 1.5 1 5 kHz 

• mean ZC rate for unvoiced speech is 49 per 10 msec interval 

• mean ZC rate for voiced speech is 14 per 10 msec interval 

Unvoiced Speech Speech: : 

the dominant energy energy gy 

component is at 

about 2.5 kHz 

Voiced Speech Speech: Speech Speech: :the : the 

dominant energy 

component is at 

about 700 Hz 

38

ZC Rates for Speech 

• 15 msec 

windows 

• 100/sec 

sampling rate on 

ZC computation 

39

Short Short-Time Time Energy, Magnitude, ZC 

40

Issues Issues in ZC ZC Rate Rate Computation 

• for zero crossing g rate to be accurate, , need zero 

DC in signal => need to remove offsets, hum, 

noise => use bandpass filter to eliminate DC and 

hum 

• can quantize the signal to 1-bit for computation 

of ZC rate 

• can apply the concept of ZC rate to bandpass 

filtered speech to give a ‘crude’ spectral estimate 

in narrow bands of speech (kind of gives an 

estimate of the strongest frequency in each 

narrow band of speech) 

41

ˆ( n) 

Summary of Simple Time Domain 

Measures 

x ˆn 

Linear x(n) T[x(n)] Lowpass 

T[ ] 

Filter Filter Filter, w(n) 

∞ 

∑ 

Q = T( x[ m]) w[ nˆ−m] nˆ 

11. Energy: 

m=−∞ 

nˆ 

∑ 

2 

E = x [ m] w[ nˆ−m] nˆ 

m= nˆ− L+ 

1 

i can downsample E Enˆ 

at rate commensurate with window bandwidth 

2. Magnitude: 

nˆ 

∑ 

M = x[ m] w[ nˆ−m] nˆ 

m= nˆ− L+ 

1 

3. Zero Crossing Rate: 

1 

Znˆ= z1= ∑ sgn( x − − 

2L 

ˆ n 

∑ [ m]) sgn( x[ m 1]) 

2L m= nˆ− L+ 

1 

where sgn( xm [ ]) = 1 xm [ ] ≥0 

=− 1 xm 

[ ] < 0 

Q 

42

Short Short-Time Short Time Autocorrelation 

-for for a deterministic signal, signal the autocorrelation function is defined as: 

∞ 

∑ 

Φ[ k] = x[ m] x[ m+ k] 

m=−∞ 

-for for a random or periodic signal signal, the autocorrelation function is: 

1 

= + 

→∞ 2 + 1 ∑ 

L 

Φ[ k] lim x[ m] x[ m k] 

N ( L ) 

m=−L - if f x [ n] = x[ n+ P], then Φ Φ[ k] = Φ Φ[ k + P], 

=> the autocorrelation ffunction 

preserves periodicity 

-properties of Φ[ k] 

: 

1. 

Φ[ k] is even, Φ[ k] = Φ[ −k] 

2. Φ[ k] is maximum at k = 0, | Φ[ k] | ≤Φ[ 0], 

∀k 

3. Φ [ 0 ] is the signal g energy gy or p power 

(for ( random signals) g ) 

43

Periodic Signals 

Signals 

• for a periodic signal we have (at least in 

theory) Φ[P]=Φ[0] so the period of a 

periodic p signal g can be estimated as the 

first non-zero maximum of Φ[k] 

– this means that the autocorrelation function is 

a good candidate ffor 

speech pitch detection 

algorithms 

– it also means that we need a good way of 

measuring the short-time autocorrelation 

function for speech signals 

44

Short-Time Autocorrelation 

- a reasonable definition for the short-time autocorrelation is: 

R[ k] = xmwn [ ] [ ˆ − mxm ] [ + kwn ] [ ˆ −k−m] nˆ 

∞ 

∑ 

m=−∞ 

11. select a segment of f speech by windowing 

2. compute deterministic autocorrelation of the windowed speech 

R [ k] = R [ −k] 

- symmetry 

nˆ nˆ 

∞ 

∑ 

= x [ mxm ] [ − k ] ⎡⎣⎡⎣wn [ ˆ − mwn ] [ ˆ + k −m 

] ⎤⎦⎤⎦ 

m=−∞ 

- define filter of the form 

h[ nˆ] = wn [ ˆ] wn [ ˆ + k] 

k 

- this enables us s to write rite the short short-time time aautocorrelation tocorrelation in the form form: 

R [ k] = x[ m] x[ m−k] h [ nˆ−m] nˆ 

∞ 

∑ k 

m=−∞ 

th 

- the value of R ˆ 

nˆ 

[ k] at time n for the k lag is obtained by filtering 

the sequence xn [ ˆ] xn [ ˆ − k] with a filter with impulse 

response h [ nˆ] 

k 

45

Short-Time Autocorrelation 

∞ 

[ ][ ] 

R = ∑ 

n[ 

k] xmwn [ ] [ − m] xm [ + kwn ] [ −k−m] m=−∞ 

L−− 1 k 

∑ 

m= 

0 

n-L+1 n+k-L+1 

[ ′ ][ ′ ] 

R [ k] = x[ n+ mw ] [ m] x[ n+ m+ k] w [ k + m] 

n 

⇒ L − k points used to compute R [ k 

] 

n 

46

Short-Time Short Time Autocorrelation 

47

• autocorrelation peaks occur at k=72, 144, ... => 140 Hz 

pitch 

• Φ(P) Φ(P)

Voiced (female) L=401 (magnitude) 

T0 

1 

F 0 = 

T 

0 

T0 N0T 

= 

t = 

nT 

F F / 

49 s

Voiced (female) L=401 (log mag) mag 

T0 

F = 

1 

0 

T0 

T0 

t = 

nT 

F / 

F 50 s

Voiced (male) L=401 

T0 

3 

3F 

0 = 

T 

0 

T0 

51

Unvoiced L=401 

52

Unvoiced L=401 

53

Effects of Window Size 

L=401 

L=251 

L=125 

• choice of L, window duration 

• small L so pitch period 

almost constant in window 

• large L so clear 

periodicity seen in window 

• as k increases, the 

number of window points 

decrease, reducing the 

accuracy and size of Rn(k) ffor large l k => hhave a taper t 

of the type R(k)=1-k/L, |k|

Modified Autocorrelation 

• want to solve problem of differing number of samples for each 

different k term in R n(k), so modify definition as follows: 

∞ 

∑ 

Rˆ [ k] = x[ m] w [ n− m] x[ m+ k] w [ n−m−k] n 

m=−∞ 

1 2 

- where w is standard L-point window, and w is extended window 

1 2 

of duration L + K samples samples, where K is the largest lag of interest 

- we can rewrite modified autocorrelation 

as: 

∞ 

∑ 

Rˆ [ k] = x[ n+ m] wˆ [ m] x[ n+ m+ k] wˆ [ m+ k] 

n 

m=−∞ 

1 2 

- where 

wˆ1[ m] = w1[ − m] and wˆ2[ m] = w2[ −m] 

- for rectangular windows we choose the following: 

w wˆ 1 [ m ] = 1 1, 0 ≤ m ≤ L L− 

1 

wˆ 2[ 

m] = 1, 0≤ m ≤L− 1+ 

K 

-giving 

L−1 

R Rˆ n [ k ] = ∑ x [ n + m ] x [ n + m + k k] ], 0 ≤ k ≤ K 

m= 

0 

- always use L samples in computation of Rˆ [ k] ∀k 

n 

55

Examples of Modified Autocorrelation 

L-1 

L+K-1 

- Rˆ n[ 

k] 

is a cross-correlation, not an auto-correlation 

- Rˆ [ ] ≠ ˆ 

n k R n n[ 

−k] 

- Rˆ n[ 

k] will have a strong peak at k = P for periodic signals 

and will not fall off for large k 

56

Examples of Modified Autocorrelation 

57

Examples Examples of Modified AC 

L=401 

L=401 L 401 

L=401 

Modified Autocorrelations – 

fixed value of L=401 

Modified Autocorrelations – 

values of L=401,251,125 

L=401 

L=251 

L=125 

58

Summary 

• short-time analysis of speech signals 

– short-time energy (short-time log energy) 

– short-time average magnitude 

– short-time short time zero crossing rate 

– short-time autocorrelation 

– modified short-time autocorrelation 

59

Time Domain Methods in Speech Processing

Create successful ePaper yourself

Delete template?

Save as template?