19.07.2014 Views

Flexible Wavelets for Music Signal Processing*

Flexible Wavelets for Music Signal Processing*

Flexible Wavelets for Music Signal Processing*

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Journal of New <strong>Music</strong> Research 0929-8215/01/3001-013$16.00<br />

2001, Vol. 30, No. 1, pp. 13–22 © Swets & Zeitlinger<br />

<strong>Flexible</strong> <strong>Wavelets</strong> <strong>for</strong> <strong>Music</strong> <strong>Signal</strong> <strong>Processing*</strong><br />

Gianpaolo Evangelista<br />

Swiss Federal Institute of Technology, Audio-Visual Communication Laboratory, Lausanne, Switzerland<br />

Abstract<br />

<strong>Music</strong>al signals require sophisticated time-frequency techniques<br />

<strong>for</strong> their representation. In the ideal case, each<br />

element of the representation is able to capture a distinct<br />

feature of the signal and can be attached either a perceptual<br />

or an objective meaning. Wavelet trans<strong>for</strong>ms constitute a<br />

remarkable advance in this field and have several advantages<br />

over Gabor expansions or short-time Fourier methods. However,<br />

application of conventional wavelet bases on musical<br />

signals produces disappointing results <strong>for</strong> at least two<br />

reasons: (1) the frequency resolution of dyadic wavelets is<br />

one-octave, too poor <strong>for</strong> any meaningful acoustic decomposition<br />

and (2) pseudoperiodicity or pitch in<strong>for</strong>mation of<br />

voiced sounds is not exploited. Fortunately, the definition of<br />

wavelet trans<strong>for</strong>m can be extended in several directions,<br />

allowing <strong>for</strong> the design of bases with arbitrary frequency resolution<br />

and <strong>for</strong> adaptation to time-varying pitch characteristic<br />

in signals with harmonic or even inharmonic structure of<br />

the frequency spectrum. In this paper we discuss refined<br />

wavelet methods that are applicable to musical signal analysis<br />

and synthesis. <strong>Flexible</strong> wavelet trans<strong>for</strong>ms are obtained<br />

by means of frequency warping techniques.<br />

Introduction<br />

* “Sound examples are available in the JNMR Electronic Appendix<br />

(EA), please see ”<br />

The analysis and processing of most features of sound requires<br />

mixed time and frequency characterization. Gabor expansions<br />

and the Short-Time Fourier Trans<strong>for</strong>m (STFT) were among<br />

the first techniques to be applied to audio signals in order to<br />

display their time-varying frequency spectrum characteristics.<br />

The derived spectrogram is still in use in many applications.<br />

While the human hearing sense is particularly gifted at classifying<br />

acoustic patterns according to both their time of occurrence<br />

and brightness of their transitory frequency spectrum,<br />

our mathematical models, having to cope with the uncertainty<br />

principle, cannot achieve indefinitely accurate resolution in<br />

both time and frequency domains. Our perception of sound<br />

is likely based on a non-linear frequency scale, as reflected<br />

by both cochlear functional models, psychoacoustic theories<br />

and conventional tone classification, e.g., tempered scale, in<br />

western music. This observed behavior is contrasting with the<br />

frequency resolution of the STFT, which is uni<strong>for</strong>m on any<br />

portion of the frequency axis. In audio signal processing the<br />

choice of the representation is crucial <strong>for</strong> deriving elegant and<br />

efficient algorithms <strong>for</strong> sound manipulation, special effects,<br />

coding, feature detection and synthesis. An important goal is<br />

to achieve a close match between our perception and the mathematical<br />

realization and efficiency of the representation.<br />

The introduction of wavelets and multiresolution approximation<br />

geared the analysis tools towards variable timefrequency<br />

resolution with the construction of integral<br />

trans<strong>for</strong>ms and expansion bases adjusted on non-uni<strong>for</strong>m<br />

time-frequency grids (Daubechies, 1992) (Mallat, 1998).<br />

Higher frequencies are represented at higher time resolution<br />

and lower frequencies at lower time resolution, while keeping<br />

the uncertainty product constant. On one hand, the integral<br />

wavelet trans<strong>for</strong>m is redundant but it does not introduce limitations<br />

on either resolution other than those dictated by the<br />

uncertainty principle. On the other hand the dyadic wavelet<br />

bases are easy to generate and fast algorithms <strong>for</strong> computing<br />

the expansion coefficients are available but they constrain<br />

the frequency resolution to one octave (Evangelista, 1991;<br />

1997). Wavelet bases based on finer rational frequency<br />

resolution factors lead to difficult design procedures in order<br />

to satisfy regularity constraints (Blu, 1998). Furthermore,<br />

the resolution factor is in practice limited to ratios of small<br />

integers. In contrast, frequency warping techniques allow<br />

<strong>for</strong> arbitrary choice of analysis bands. This is an important<br />

Accepted: 30 May, 2001<br />

Correspondence: G. Evangelista, Swiss Federal Institute of Technology, Audio-Visual Communication Laboratory, IN-LCAV Ecublens,<br />

CH-1015 Lausanne, Switzerland. E-mail: gianpaolo.evangelista@epfl.ch


14 G. Evangelista<br />

point if we want to adapt the signal representation either<br />

to perceptual scales, e.g., Bark scale, or to objective bands<br />

associated to a single feature. However, computation of<br />

the trans<strong>for</strong>m based on general warping maps is not efficient<br />

unless one is able to derive an associated discrete-time<br />

signal processing structure (Evangelista & Cavaliere,<br />

1998d).<br />

In this paper, we present an approach to flexible wavelet<br />

trans<strong>for</strong>ms where the frequency resolution of the basis elements<br />

can be designed at will. This is achieved by discretetime<br />

iterated frequency warping. We consider the important<br />

and unique case where each elementary frequency warping<br />

stage is computationally realizable with arbitrary precision.<br />

The main ingredient is the Laguerre trans<strong>for</strong>m whose computation<br />

is achieved by means of a chain of digital allpass<br />

filters or dispersive delay line. Iterated Laguerre warping is<br />

applied to wavelets leading to wavelets whose bandwidths<br />

can be assigned by selecting a set of parameters (Evangelista<br />

& Cavaliere, 1998c). Interestingly enough, these wavelets are<br />

still based on a dyadic scheme, which is a natural setting <strong>for</strong><br />

iterated band splitting.<br />

Another issue in musical signal processing is that the<br />

frequency spectrum of voiced signals peaks at uni<strong>for</strong>mly or<br />

non-uni<strong>for</strong>mly spaced frequencies. While exact periodicity<br />

is rarely encountered in real-life signals, exploitation of<br />

the pitch in<strong>for</strong>mation or of pseudoperiodicity is an asset <strong>for</strong><br />

their representation. The Fourier trans<strong>for</strong>m is particularly<br />

gifted at revealing periodicity, however the STFT represents<br />

pseudoperiodic signals as a superposition of partials whose<br />

dynamics is captured with uni<strong>for</strong>m time resolution. It follows<br />

that transients are blurred if the selected time-resolution is<br />

too poor or frequency resolution is at risk when the analysis<br />

window is too short.<br />

In previous papers (Evangelista, 1993; 1994) the author<br />

suggested methods <strong>for</strong> exploiting the pitch in<strong>for</strong>mation<br />

in wavelet representations by introducing the Pitch-<br />

Synchronous Wavelet Trans<strong>for</strong>m (PSWT). The signal is first<br />

converted into a sequence of variable-length vectors each<br />

containing the samples of one period of the signal, then the<br />

sequences of components are analyzed by means of an array<br />

of wavelet trans<strong>for</strong>ms. This representation is able to capture<br />

period-to-period fluctuations of the signal by means of basis<br />

elements that are comb-like in the frequency domain. Periodic<br />

behavior is trapped in a narrow comb adjusted on the<br />

harmonic grid and transients are represented at several scales<br />

by multiresolution basis elements whose Fourier trans<strong>for</strong>ms<br />

are organized in ensembles of sidebands of the harmonics.<br />

Faster transitions are represented, at fine time resolution, by<br />

larger bands lying far from the harmonics. Vice-versa, slow<br />

transitions and quasi-stationary dynamics are represented, at<br />

coarser time resolution, by narrower sidebands lying closer<br />

to the harmonics. In particular, this technique allows <strong>for</strong> separation<br />

of perceptually and physically distinct phenomena<br />

such as the bow noise and the harmonic resonant component<br />

in a violin tone. Importantly enough, this separation is<br />

achieved by means of a complete and orthogonal set and the<br />

sum of the components exactly reconstructs the original<br />

signal.<br />

The PSWT technique is amenable to further generalization,<br />

essentially by introducing alternate methods <strong>for</strong><br />

<strong>for</strong>ming the vector signal, e.g., based on intermediate Gaborlike<br />

or STFT representations, which can be <strong>for</strong>malized in the<br />

multiwavelets framework.<br />

While the PSWT works efficiently when the frequencies<br />

of the partials are arranged on a harmonic grid, the frequency<br />

spectrum of many sounds, such as drums and piano tones in<br />

the low register, has an inherently inharmonic structure.<br />

From a physical model point of view, this phenomenon may<br />

be explained by the fact that, when the stiffness of the material<br />

is not negligible, dispersive wave propagation occurs<br />

within the medium. Furthermore, coupled 2D modes in<br />

drums have a peculiar distribution of eigenfrequencies. A<br />

partial solution of this problem is provided by frequency<br />

warping techniques that allow <strong>for</strong> redistribution of eigenfrequencies<br />

into harmonics. Inharmonic signals are first regularized<br />

by means of a frequency map and then analyzed by<br />

means of PSWT, leading to the Frequency Warped PSWT<br />

(Evangelista & Cavaliere, 1998c).<br />

The paper is organized as follows. In section 2 the properties<br />

of one-step frequency warping maps of continuoustime<br />

signals and wavelets are examined. Methods <strong>for</strong><br />

frequency warping discrete-time signals and the Laguerre<br />

trans<strong>for</strong>m are illustrated in section 3. These methods are<br />

applied to the definition of both discrete-time and continuous-time<br />

warped wavelets in section 4.<br />

Frequency warping and wavelets<br />

Frequency warping<br />

Frequency warping is obtained by mapping the frequency<br />

axis w by means of a suitable real function W(w). Given a<br />

continuous-time signal x(t) with Fourier Trans<strong>for</strong>m (FT)<br />

we <strong>for</strong>m the warped signal<br />

i.e., we let<br />

Ú<br />

+•<br />

jw<br />

t<br />

X( w)= x() t e - dt<br />

-•<br />

1 +•<br />

jwt<br />

˜x()= t Ú X( W( w)<br />

) e dw<br />

2p -•<br />

Thus, at any frequency w – the FT of the warped signal has the<br />

same magnitude as the FT of the signal at frequency W(w – ),<br />

i.e., the frequency spectrum of the warped signal is a<br />

de<strong>for</strong>med version of that of the original signal. Clearly, <strong>for</strong><br />

the warping operation to be well defined, the map W(w) must<br />

satisfy a number of requirements. In this paper we are inter-<br />

j w t<br />

˜X( w)= X( ( w)<br />

)= x() t e - W<br />

W<br />

( )<br />

dt<br />

Ú<br />

+•<br />

-•<br />

(1)


<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 15<br />

ested in reversible warping operations defined by a strictly<br />

increasing, invertible, map W of the real axis onto itself. The<br />

map is also assumed to have odd parity: W(-w) = -W(w). In<br />

this way frequency ordering is preserved and warping a real<br />

signal yields a real signal. Notice that, as defined in Equation<br />

1, frequency warping is not energy preserving since<br />

narrow frequency bands may be mapped into broader bands<br />

with equal peak amplitude. In order to preserve energy in any<br />

frequency interval one needs to multiply the right hand side<br />

of Equation 1 by a suitable amplitude scaling function or,<br />

more generally, define a suitable measure of the frequency<br />

axis. In the sequel we will assume that the map is almost<br />

everywhere differentiable, e.g., that W(w) is absolutely continuous,<br />

and define a scaled warped operation on the signal<br />

x(t) as the one producing the FT<br />

where the derivative W¢(w) is positive almost everywhere<br />

since the map is assumed to be strictly increasing. In that case<br />

one obtains energy preservation in any band. In fact, a frequency<br />

band with support [w 0 ,w 1 ] is mapped into a frequency<br />

band with support [W -1 (w 0 ), W -1 (w 1 )] and<br />

1<br />

2p<br />

Ú<br />

From a perceptual point of view this property is important<br />

since the scaled warped signal is properly equalized while in<br />

the unscaled version some parts of the frequency spectrum<br />

are boosted. From a mathematical point of view scaled<br />

warping is associated with an orthogonal operator :<br />

with real kernel<br />

-1<br />

W ( w1)<br />

-1<br />

W ( w0<br />

)<br />

ˆX( w)= W¢( w) X( W( w)<br />

)<br />

2 1 w1<br />

2<br />

ˆX( w) dw<br />

= X w dw<br />

2p<br />

Ú ( )<br />

w0<br />

xt ˆ ()= [ x]()= t Wt ( , t) x( t)<br />

dt<br />

1 +•<br />

j( wt- W( w)<br />

t)<br />

Wt ( ,t)= ¢( w) e dw<br />

2p<br />

Ú W<br />

-•<br />

and inverse operator -1 defined by the kernel<br />

Ú<br />

W - 1<br />

( t, t )= W ( t,<br />

t )<br />

Finally, we remark that, by exploiting duality of the FT one<br />

can define time warping operations in a similar fashion.<br />

Simple warped wavelets<br />

In (Baraniuk & Jones, 1993; 1995) unitary warping operators<br />

were exploited to define warped wavelets. The construction<br />

may be summarized as follows: the signal is<br />

unwarped by means of the inverse warping operator -1 ,<br />

then the expansion coefficients on a dyadic wavelet basis<br />

are computed. Reconstruction is achieved by applying the<br />

warping operator to the wavelet expansion of the<br />

unwarped signal. In other words, given an orthogonal and<br />

complete dyadic wavelet set<br />

+•<br />

-•<br />

(2)<br />

where<br />

y<br />

one computes the expansion coefficients<br />

-<br />

xˆ = 1 x,<br />

y<br />

and reconstructs the signal as follows<br />

xt ()= Â xˆ nm , [ y<br />

nm , ]() t<br />

Since the frequency warping operator is orthogonal we have<br />

-1<br />

xˆ = x, y = x,<br />

y<br />

nm , nm , nm ,<br />

and Equation 4 may be considered as the expansion on the<br />

warped wavelet basis<br />

Indeed this basis is orthogonal since<br />

-1<br />

y , y = y , y = d d<br />

n¢ , m¢ n, m n¢ , m¢<br />

n, m n¢ , n m¢<br />

, m<br />

and complete in view of unitary equivalence.<br />

The FT of the warped wavelets<br />

ˆ<br />

nm , t y<br />

nm ,<br />

y<br />

{ ()} Œ<br />

-n<br />

2 -n<br />

()= t 2 y 2 t-m<br />

nm , 00 ,<br />

nm , nm ,<br />

{[ y nm , ]() t }<br />

nm , Œ<br />

is related to the FT of the dyadic wavelets as follows<br />

ˆ<br />

nm , ( w)= ¢( w) nm , ( ( w)<br />

)<br />

n<br />

n n j2<br />

mW<br />

w<br />

= 2 W¢( w) Y00<br />

, 2 W( w)<br />

e<br />

Y W Y W<br />

from which we see that warped wavelets are not simply generated<br />

by dilating and translating a mother wavelet as in<br />

dyadic wavelets (Eq. 3). Rather, the “translated wavelets” are<br />

generated by generally non-rational allpass filtering e -j2n mW(w) .<br />

Scaling depends on the warping map W(w) as well. However,<br />

frequency warped wavelets have the remarkable property that<br />

their frequency resolution, i.e., their essential frequency<br />

support, may be arbitrarily assigned by proper choice of the<br />

map W(w). Indeed, if the cut-off frequencies of dyadic<br />

wavelets are fixed at 2 -n p, the cut-off frequencies of the<br />

warped wavelets are<br />

w<br />

y nm , t<br />

nm ,<br />

nm , Œ<br />

()∫ [ ]() t<br />

n<br />

- -n<br />

= W 1 ( 2 p)<br />

For our purposes we just need to add the remark that genuine<br />

scale a wavelets, 0 < a < 1, may be generated by selecting as<br />

warping map a function W(w) satisfying the following property<br />

(a-homogeneity):<br />

<br />

1<br />

W( aw)= W( w)<br />

2<br />

( )<br />

( ) - ( )<br />

(3)<br />

(4)<br />

(5)<br />

(6)


16 G. Evangelista<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

Fig. 1. Example of warping map <strong>for</strong> power of a wavelet cut-off<br />

choice.<br />

such as<br />

reported in Figure 1. Repeated application of Equation 6<br />

shows that<br />

By deriving both sides of Equation 8 with respect to w one<br />

obtains<br />

By substituting Equations 8 and 9 in Equation 5 we<br />

obtain<br />

Yˆ<br />

which shows that warped wavelets generated by an a-homogeneous<br />

warping map are obtained by dilating the family of<br />

mother wavelets Ŷ 0,m (w), m Œ , while wavelets at fixed scale<br />

level n are allpass related:<br />

Yˆ<br />

W<br />

W( w)=<br />

pw<br />

w<br />

W( a -n<br />

w)= 2<br />

n W( w)<br />

-n<br />

n<br />

¢( a w)= ( 2a) W¢( w)<br />

-n<br />

Yˆ<br />

n<br />

( w)= a<br />

2 -<br />

( a w)<br />

nm , 0,<br />

m<br />

n<br />

jmW<br />

a w<br />

( w)= Yˆ<br />

( w) e<br />

- ( -<br />

)<br />

nm , n,<br />

0<br />

-<br />

2 log a<br />

Without loss of generality we can assume that<br />

W(p) = p<br />

so that the cut-off frequencies of the scale a warped wavelets<br />

are fixed at<br />

w n = a n p<br />

w<br />

p<br />

(7)<br />

(8)<br />

(9)<br />

If a = 1 – 2 then the map (Eq. 7) reverts to the identical map<br />

W(w) = w and the warped wavelets coincide with the<br />

dyadic wavelets. If a > 1 – 2 then the warped wavelets achieve a<br />

finer frequency resolution than the dyadic wavelets.<br />

Discrete-time frequency warping and<br />

Laguerre trans<strong>for</strong>m<br />

The continuous-time warped wavelet expansion illustrated in<br />

the previous section is applicable to continuous-time signals<br />

and requires warping of the signal, a task that is computationally<br />

difficult to per<strong>for</strong>m. An important property of dyadic<br />

wavelet expansions is that they have a fast algorithm to<br />

iteratively compute the expansion coefficients even in the<br />

continuous-time case. In this section we consider frequency<br />

warping of discrete-time signals and relate this to an orthogonal<br />

trans<strong>for</strong>m that can be computed by means of a chain of<br />

rational allpass filters.<br />

Since the Discrete-Time Fourier Trans<strong>for</strong>m (DTFT) of a<br />

sequence x(n) is periodic, reversible frequency warping of<br />

discrete-time signals requires an invertible frequency map of<br />

the interval [-p, p] onto itself. As we did in the continuoustime<br />

case, we will assume that q (w) is almost everywhere<br />

differentiable, has odd parity and maps the point p into itself.<br />

As discussed in the previous section, we are interested in procedures<br />

<strong>for</strong> unwarping the signal, producing the signal xˆ(n)<br />

whose DTFT is<br />

It follows that the discrete-time unwarping operator -1 is<br />

orthogonal and has kernel<br />

-1<br />

+ d<br />

-<br />

-1<br />

1 p q<br />

1<br />

jn ( w- kq ( w)<br />

)<br />

W ( n,<br />

k)=<br />

Ú e dw<br />

2p<br />

-p<br />

dw<br />

1 + p<br />

jk ( w- nq( w)<br />

)<br />

= Ú q¢( w)<br />

e dw<br />

2p<br />

-p<br />

The unwarped signal is<br />

xn ˆ ()= [ - 1 -<br />

x]()= n W 1 ( nk , ) xk ( )= l ( kxk ) ( )<br />

where we defined<br />

(10)<br />

(11)<br />

(12)<br />

Notice that Equation 11 is in the <strong>for</strong>m of a scalar product of<br />

the signal with the sequence l n (k). By the orthogonality of<br />

the operator<br />

and the set {l n (k)} nΠis orthogonal and complete. In fact:<br />

and<br />

Â<br />

k<br />

-1<br />

dq<br />

-1<br />

ˆX ( w)= X q ( w)<br />

dw<br />

Â<br />

k<br />

-1<br />

l n ()∫ k W ( n,<br />

k)<br />

W - 1<br />

( n, k )= W ( k,<br />

n )<br />

Â<br />

( )<br />

Â<br />

-1<br />

l () k l ()= k W ( n, k) W( k, n¢<br />

)= d ,<br />

n n¢<br />

n n¢<br />

k<br />

k<br />

n


<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 17<br />

Â<br />

n<br />

Â<br />

-1<br />

l () k l ( k¢<br />

)= W ( n, k) W( k¢<br />

, n)=<br />

d ,<br />

n n k k¢<br />

k<br />

Thus, the discrete-time unwarped signal is given by the<br />

sequence of expansion coefficients<br />

xn ˆ ()= x,<br />

ln<br />

(13)<br />

of the signal over the orthogonal basis {l n (k)} nΠand the<br />

signal x(n) is recovered from xˆ(n) by the expansion series<br />

Fig. 2. Structure <strong>for</strong> computing discrete-time signal unwarping by<br />

means of the Laguerre trans<strong>for</strong>m.<br />

xn ()= Â xk ˆ () l k () n<br />

k<br />

In turn, if the signal is causal, i.e., if x(k) = 0 <strong>for</strong> k < 0,<br />

then the scalar product (Eq. 13) may be computed by convolving<br />

the time-reversed signal y(-k) = x(k) by l n (k) and<br />

sampling the result at n = 0:<br />

•<br />

•<br />

x,ln = Â x( k) ln( k)= Â y( 0 -k) ln( k)<br />

k = 0<br />

k = 0<br />

Furthermore, from Equations 12 and 2 it is easy to recognize<br />

that the DTFT of l n (k) satisfies the following recurrence:<br />

3<br />

2.5<br />

2<br />

1.5<br />

b =.9<br />

with<br />

Hence<br />

L<br />

n<br />

( w)= L ( w)<br />

e<br />

n-1<br />

L 0 ( w)= q¢( w)<br />

- jqw<br />

( )<br />

(14)<br />

1<br />

0.5<br />

b =-.9<br />

L<br />

n<br />

jnqw<br />

( w)= L ( w) e<br />

- ( )<br />

0<br />

where the term e -jnq(w) can be assimilated to the response of<br />

an allpass filter. However, <strong>for</strong> digital implementations one<br />

needs to constrain -q(w) to be the phase of a causal and<br />

stable, rational allpass filter. For reversible warping we need<br />

to constrain q (w) to a one-to-one mapping of the normalized<br />

frequency interval [-p, +p]. Furthermore, we want to map a<br />

real signal into a real warped signal, thus q(w) must have<br />

odd parity. It is easy to show that these conditions are verified<br />

if and only if -q(w) is the phase of a first-order real<br />

allpass filter with transfer function<br />

-1<br />

z - b<br />

Az ()= , - 1< b < 1<br />

-1<br />

1 - bz<br />

The corresponding basis set l n (k) is recognized to be the<br />

discrete Laguerre basis (Broome, 1965) (Evangelista &<br />

Cavaliere, 1998d) (Oppenheim et al., 1971), orthogonal and<br />

complete over the set of finite energy causal signals, with<br />

b sinw<br />

qw ( )= w+<br />

2 arctan , - 1< b < 1 (15)<br />

1 - b cosw<br />

This one-parameter family of warping curves (Eq. 15) is<br />

shown in Figure 3. In this family, <strong>for</strong> any allowed value of<br />

the parameter b, the curves corresponding to b and to -b are<br />

inverse of each other. The curve <strong>for</strong> b = 0 is the identical map<br />

q(w) = w. Furthermore,<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

Fig. 3. Family of warping curves associated to the Laguerre<br />

trans<strong>for</strong>m.<br />

2<br />

1 - b<br />

q¢( w)=<br />

1- 2bcosw<br />

+ b<br />

In computing the Laguerre trans<strong>for</strong>m of a causal signal, the<br />

function in Equation 14 can be replaced by a causal realization<br />

of q¢( w)<br />

, namely<br />

1 - b<br />

1 - be j<br />

2<br />

L 0 ( w)=<br />

- w<br />

Laguerre warping is the unique one-to-one warping that can<br />

be numerically computed by means of rational filters. A<br />

digital structure implementing Laguerre warping is shown in<br />

Figure 2. There, the input signal is time reversed, filtered by<br />

L 0 (w) and fed to a dispersive delay line consisting of a chain<br />

of allpass filters The warped signal is obtained by collecting<br />

the samples present at the output of each filter at time n = 0.<br />

Although the computational structure in principle requires<br />

an infinite number of filter sections, by a simple argument<br />

on the maximum group delay, one can show (Evangelista<br />

& Cavaliere, 1998d) that, to a good approximation, the<br />

2


18 G. Evangelista<br />

3<br />

2.5<br />

2<br />

Fig. 5.<br />

Two-channel warped filter bank structure.<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

Fig. 6.<br />

unwarped downsampler<br />

Equivalent downsampled warped filtering structures.<br />

Fig. 4. Frequency warping harmonic partials to obtain inharmonic<br />

partials.<br />

Laguerre trans<strong>for</strong>m of a finite-length N signal can be computed<br />

by means of<br />

1 + b<br />

M ª N<br />

1 - b<br />

allpass sections. As already observed, the inverse Laguerre<br />

trans<strong>for</strong>m can be computed using the same structure of<br />

Figure 2 provided that we reverse the sign of the Laguerre<br />

parameter b.<br />

Frequency warping is per se an interesting effect from a<br />

musical point of view. In fact, by frequency warping a periodic<br />

signal one obtains an inharmonic distribution of the partials,<br />

as shown in Figure 4, typical of bells, drums or piano<br />

tones in the low-register. There<strong>for</strong>e, the frequency spectrum<br />

of an initial sound is morphed into another sound, possibly<br />

belonging to a different instrument class. Time-varying<br />

versions of the warping algorithm have been devised<br />

(Evangelista, 2000b). Further approximations via overlapadd<br />

of the Short-Time Laguerre Trans<strong>for</strong>m allow us to compute<br />

the effect in real time (Evangelista, 2000b). Sound<br />

examples obtained by frequency warping can be visited at<br />

http://lcavwww.epfl.ch/~gianevan/research/fwex.html.<br />

Warped filter banks and iterated<br />

warped wavelets<br />

Two-channel warped filter bank<br />

In the classical construction of dyadic wavelets (Daubechies,<br />

1992) a two-channel critically sampled filter bank serves as<br />

building block. This is based on two quadrature mirror filters<br />

(QMF) H(z), low-pass, and G(z), high-pass. These filters are<br />

half-band with cut-off frequency w c = p – 2<br />

. In (Evangelista &<br />

Cavaliere, 1998d) the structure of Figure 5 was considered<br />

as a building block. There, the signal is frequency unwarped<br />

by means of Laguerre trans<strong>for</strong>m (LT) prior to filtering. The<br />

operators [Ø] and [≠] denote, respectively, downsampling<br />

(decimation) by 2 and upsampling (zero-insertion) by 2.<br />

Unwarping the signal is equivalent to warping the filters and<br />

to unwarping the downsampling operators (Evangelista &<br />

Cavaliere, 1998d), as depicted in Figure 6. Thus, on one<br />

hand the pass-band of the filters is altered and the cut-off<br />

frequency is moved to w c = q -1 ( p – 2<br />

), on the other hand the<br />

equivalent unwarped downsampling operator, which embeds<br />

unwarping and conventional downsampling, resets the band<br />

of the filtered signal to half-band prior to downsampling. By<br />

combining the Laguerre trans<strong>for</strong>m with an orthogonal filter<br />

bank one obtains an orthogonal perfect reconstruction structure<br />

whose bands can be arbitrarily allocated by fixing the<br />

parameter b.<br />

The structure of Figure 5 can be generalized to include an<br />

ordinary discrete wavelet trans<strong>for</strong>m structure in place of the<br />

two-channel filter bank. This provides a first <strong>for</strong>m of discretetime<br />

warped wavelets. However, the analysis bands cannot<br />

be arbitrarily selected since they are determined by trans<strong>for</strong>ming<br />

the cut-off frequencies 2 -n p according to a single<br />

curve in the Laguerre family. As shown in the next section,<br />

fully arbitrary bandwidth allocation is achieved by iterating<br />

the full warped filter bank structure.<br />

Frequency warping may be successfully employed <strong>for</strong><br />

trans<strong>for</strong>ming inharmonic sounds into harmonic ones or viceversa.<br />

When used in conjunction with the PSWT one obtains<br />

the Pitch-Synchronous Frequency Warped Wavelet Trans<strong>for</strong>m<br />

(PSFWWT) useful <strong>for</strong> decomposing inharmonic instrument<br />

sounds into regular pseudo-periodic components plus


<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 19<br />

...<br />

Fig. 9. Synthesis filter bank <strong>for</strong> computing the inverse<br />

discrete-time frequency warped wavelet trans<strong>for</strong>m.<br />

jmWn<br />

Ynm , ( w)= Ln, ( Wn ( w)<br />

)<br />

Ê 1<br />

G Wn( w)<br />

ˆ<br />

0 -1 Fn-10<br />

, ( w)<br />

e<br />

Ë 2 ¯<br />

where<br />

n<br />

1<br />

Fn, 0( Ï<br />

w)= Lk,<br />

0( Wk 1( w)<br />

)<br />

Ê<br />

H Wk( w)<br />

ˆ¸<br />

’ Ì<br />

-<br />

Ó<br />

Ë 2 ¯<br />

˝<br />

˛<br />

k = 1<br />

is the DTFT of the warped scaling sequence,<br />

- ( w )<br />

(16)<br />

(17)<br />

Fig. 7. Frequency domain characteristics of pitchsynchronous frequency<br />

warped wavelets.<br />

Fig. 8. Analysis filter bank <strong>for</strong> computing the discrete-time frequency<br />

warped wavelet trans<strong>for</strong>m.<br />

fluctuations. The magnitude FT of the basis elements is<br />

reported in Figure 7. The PSFWWT decomposition allows,<br />

e.g., to resolve the hammer noise in a piano tone. An account<br />

of this method is given in (Evangelista & Cavaliere, 1998c)<br />

(Testa et al., 1997), where the Laguerre warping family is seen<br />

to close match the dispersion curves provided by both experimental<br />

data and physical model based on a 4th order PDE.<br />

Discrete-time warped wavelets with arbitrary<br />

band allocation<br />

A discrete-time warped wavelet decomposition may be<br />

obtained by iterating the two-channel warped filter bank.<br />

Similarly to the ordinary wavelet trans<strong>for</strong>m, identical warped<br />

filter bank structures are nested in the low-pass branch. The<br />

corresponding analysis and synthesis structures are reported<br />

in Figures 8 and 9, respectively. Each warping stage is characterized<br />

by a different value b k of the Laguerre parameter.<br />

The warped wavelets are the impulse responses of the lower<br />

branches of the synthesis structure <strong>for</strong> any value of the shift.<br />

One can show (Evangelista & Cavaliere, 1998c) that the<br />

DTFT of the level n warped wavelet has the <strong>for</strong>m<br />

...<br />

2<br />

1-<br />

bk<br />

L k ,0( w)=<br />

- jw<br />

1-<br />

be k<br />

is a causal realization of qk¢( w)<br />

and<br />

W k ( w)= J k oJ k - o...<br />

1 oJ1( w)<br />

(18)<br />

with<br />

Jk( w)= 2qk( w)<br />

Notice that, due to the presence of the frequency dependent<br />

delay term e -jmW n(w)<br />

in Equation 16, wavelets pertaining to the<br />

same level n are obtained from each other by an allpass filtering<br />

operation, rather than by elementary time shift as in<br />

ordinary dyadic wavelets. The cut-off frequency of the level<br />

n warped scaling sequence is given by<br />

-<br />

wn<br />

= W 1<br />

n ( p)<br />

which is the cut-off frequency of the narrowest band filter<br />

H( – 1 2 W n (w)) in the product (Eq. 17).<br />

Given a choice of the cut-off frequencies w n < w n-1 < ...<br />

< w 1 , there exists a unique solution <strong>for</strong> the Laguerre parameters<br />

b k , k = 1,..., n, obtained by the following recurrence<br />

Ê p w1<br />

ˆ<br />

b1<br />

= tan -<br />

Ë 4 2 ¯<br />

Ê p Wk-1( wk)<br />

ˆ<br />

b k = tan -<br />

(19)<br />

Ë 4 2 ¯<br />

Consequently, to any choice of the cut-off frequencies is<br />

uniquely associated a corresponding set of parameters. The<br />

iterated warped filter bank is a perfect reconstruction orthogonal<br />

structure equivalently described by a discrete-time<br />

warped wavelet basis. Cut-off frequencies may be arranged<br />

on a perceptual scale, such as Bark scale, leading to perceptually<br />

organized orthogonal trans<strong>for</strong>ms (Evangelista &<br />

Cavaliere, 1998a; 1998b). The example reported in Figure<br />

10 displays the magnitude Fourier trans<strong>for</strong>m of frequency<br />

warped wavelets and scaling sequence achieving 1/3 of<br />

octave frequency resolution. Sound examples of ordinary


20 G. Evangelista<br />

7.7<br />

6.6<br />

5.5<br />

4.4<br />

3.3<br />

2.2<br />

1.1<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

Fig. 10. Magnitude Fourier trans<strong>for</strong>m of 1/3-octave frequency<br />

warped wavelets and scaling sequence obtained by letting w n = a n p,<br />

with a = 2 - – 1 3<br />

.<br />

wavelet and frequency warped wavelet analysis can be found<br />

at http://lcavwww.epfl.ch/~gianevan/research/fwex.html.<br />

Continuous-time warped wavelets<br />

Expansion coefficients over ordinary continuous-time<br />

wavelet bases may be iteratively computed by means of an<br />

infinite cascade of two-channel filter banks. The discretetime<br />

frequency warped wavelets illustrated in the previous<br />

section can be extended to continuous-time bases. This<br />

extension requires generalization of many of the mathematical<br />

results illustrated in Daubechies (1992) and Mallat<br />

(1998) The scaling function of ordinary dyadic wavelets is<br />

obtained by means of a limit process involving rescaling of<br />

the discrete-time scaling function:<br />

where F0,0(w) c.t. and Fn,0(w) d.t. respectively denote the Fourier<br />

trans<strong>for</strong>m of the level 0 continuous-time scaling function and<br />

the DTFT of the level n scaling sequence (discrete-time).<br />

Since (Mallat 1998)<br />

then<br />

F<br />

( w)=<br />

lim<br />

n-1<br />

dt ..<br />

F 00 , w ’<br />

k = 0<br />

k<br />

( )= H( 2 w)<br />

•<br />

ct ..<br />

F 00 , ( w)=<br />

’<br />

k = 1<br />

1 dt .. Ê w ˆ<br />

F , n<br />

2 Ë 2 ¯<br />

ct ..<br />

00 ,<br />

nƕ<br />

n n 0<br />

Ê w<br />

H<br />

Ë<br />

k<br />

2<br />

2<br />

ˆ<br />

¯<br />

The iterated frequency warped counterpart of this construction<br />

is much more complex and requires a deep study of<br />

properties the iterated map (Eq. 18) and of the convergence<br />

of the parameter selection algorithm (Eq. 19). One can<br />

show that, in spite of the intricate analytic <strong>for</strong>m of the<br />

iterates W k (w), en<strong>for</strong>cing the following exponential cut-off<br />

condition:<br />

w n = a n p, n = 1, 2, . . .<br />

guarantees that the parameter sequence b k obtained by Equation<br />

10, simply converges to<br />

At the same time, we have<br />

where G -1 (w) is an increasing and continuously differentiable<br />

eigenfunction of the composition-by-J •<br />

-1<br />

operator with<br />

eigenvalue a, i.e.,<br />

G -1 ° J • -1 = aG -1 (20)<br />

Since, under general assumptions, the eigenfunction of the<br />

composition operator is unique up to a multiplicative constant<br />

(Shapiro, 1993) (Evangelista, 2000a) and since G -1 (p)<br />

= p, then<br />

-1<br />

G - 1 pg n ( w)<br />

( w)=<br />

lim<br />

n Æ• -1<br />

g ( p)<br />

where<br />

Thus, G -1 (w) is also proportional to the limit iterate of the<br />

map J • -1 . Furthermore, from Equation 20 we obtain<br />

and<br />

-1<br />

Wn<br />

( w) -<br />

lim<br />

nƕ<br />

n =<br />

1<br />

G ( w )<br />

a<br />

-1 -1 -1<br />

g n ( w)= J•<br />

oLoJ•<br />

( w)<br />

1442443<br />

-<br />

J 1 -<br />

• ( w)= G aG<br />

1 ( w)<br />

-<br />

g 1 n -<br />

( w)= G a G<br />

1 ( w)<br />

n<br />

1 a<br />

b• = - 2<br />

1+<br />

2 a<br />

.<br />

n times<br />

( )<br />

( )<br />

This allows us to define the Fourier trans<strong>for</strong>m of the<br />

continuous-time frequency warped scaling function as<br />

follows:<br />

Ê 1<br />

H g<br />

Under rather general assumptions, this construction leads<br />

to the definition of asymptotically scale a continuous-time<br />

warped wavelets. The expansion coefficients over the resulting<br />

basis can be iteratively computed by means of the struc-<br />

-1<br />

•<br />

k<br />

˜ ct ..<br />

Ë 2<br />

F 00 , ( w)=<br />

’<br />

k = 0 2<br />

( w)<br />

ˆ<br />

¯<br />

n


<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 21<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 20 40 60 80 100 120<br />

Fig. 11.<br />

Tiling the time-frequency plane by means of frequency warped wavelets.<br />

ture in Figure 8. For more details on this extension, the<br />

mathematically inclined reader is referred to (Evangelista,<br />

2000a).<br />

The warped wavelet decomposition leads to an unconventional<br />

tiling of the time-frequency plane shown in Figure<br />

11. Due to the frequency dependent delay of each basis<br />

element, the time-frequency localization zones are characterized<br />

by curved boundaries.<br />

Conclusions<br />

In this paper we explored the panorama of wavelet trans<strong>for</strong>ms<br />

obtained by means of frequency warping techniques,<br />

with particular attention to their realizability in computationally<br />

efficient structures based on rational discrete-time<br />

allpass filters and two-channel filter banks. The flexibility of<br />

the trans<strong>for</strong>m is exploited <strong>for</strong> the representation of music<br />

signals in perceptually or objectively meaningful components,<br />

allowing <strong>for</strong> adaptation to perceptual scales or <strong>for</strong><br />

orthogonal separation of transients and noise in harmonic or<br />

inharmonic sounds via pitch-synchronous methods. The<br />

refinable frequency resolution of frequency warped wavelets<br />

is a fundamental characteristic <strong>for</strong> the analysis of any musical<br />

sound.<br />

Frequency warping emerges as an appealing technique in<br />

musical signal processing <strong>for</strong> the creation of new effects and<br />

<strong>for</strong> the analysis and synthesis of features of sounds by means<br />

of adapted representations.<br />

References<br />

Baraniuk, R.G. & Jones, D.L. (1993). Warped wavelets<br />

bases: Unitary equivalence and signal processing. Proc.<br />

ICASSP’93, IEEE, III, 320–323.<br />

Baraniuk, R.G. & Jones, D.L. (1995). Unitary equivalence: A<br />

new twist on signal processing. IEEE Trans. <strong>Signal</strong> Processing,<br />

43, 2269–2282.<br />

Blu, T. (1998). A new design algorithm <strong>for</strong> two-band orthonormal<br />

filter banks and orthonormal rational wavelets. IEEE<br />

Trans. On <strong>Signal</strong> Processing, 46, 1494–1504.<br />

Broome, P.W. (1965). Discrete orthonormal sequences. J. Assoc.<br />

Comput. Machinery, 12, 151–168.<br />

Daubechies, I. (1992). Ten Lectures on <strong>Wavelets</strong>. CBMS-<br />

NSF Reg. Conf. Series in Appl. Math. Philadelphia, PA.:<br />

SIAM.<br />

Evangelista, G. (1991). Wavelet trans<strong>for</strong>ms that we can play. In:<br />

G. De Poli, A. Piccialli, & C. Roads (Eds.), Representations<br />

of <strong>Music</strong>al <strong>Signal</strong>s (pp. 119–136). Cambridge, MA: MIT<br />

Press.<br />

Evangelista, G. (1993). Pitch synchronous wavelet representations<br />

of speech and music signals. IEEE Trans. on <strong>Signal</strong><br />

Processing, 41, 3313–3330. Special issue on <strong>Wavelets</strong> and<br />

<strong>Signal</strong> Processing.<br />

Evangelista, G. (1994). Comb and multiplexed wavelet trans<strong>for</strong>ms<br />

and their applications to signal processing. IEEE<br />

Trans. on <strong>Signal</strong> Processing, 42, 292–303.<br />

Evangelista, G. (1997). Wavelet representations of musical<br />

signals. In: C. Roads, S.T. Pope, A. Piccialli, & G. De Poli


22 G. Evangelista<br />

(Eds.), <strong>Music</strong>al <strong>Signal</strong> Processing (pp. 127–153).<br />

Amsterdam: Swets & Zeitlinger.<br />

Evangelista, G. (2000b). Time and frequency warping of<br />

musical signals. In: COST-G6 (Ed.), Digital Audio Effects<br />

(DAFx) (in preparation). Chichester, England: Wiley.<br />

Evangelista, G. (2001). Dyadic warped wavelets. Advances in<br />

Imaging and Electron Physics, 117, 73–171.<br />

Evangelista, G. & Cavaliere, S. (1998a). Arbitrary bandwidth<br />

wavelet sets. Proc. of ICASSP’98, III, 1801–1804.<br />

Evangelista, G. & Cavaliere, S. (1998b). Auditory modeling via<br />

frequency warped wavelet trans<strong>for</strong>m. Proc. of EUSIPCO98,<br />

I, 117–120.<br />

Evangelista, G. & Cavaliere, S. (1998c). Discrete frequency<br />

warped wavelets: Theory and applications. IEEE Trans. on<br />

<strong>Signal</strong> Processing, 46, 874–885. Special issue on Theory and<br />

Applications of Filter Banks and <strong>Wavelets</strong>.<br />

Evangelista, G. & Cavaliere, S. (1998d). Frequency warped filter<br />

banks and wavelet trans<strong>for</strong>ms: A discrete-time approach via<br />

Laguerre expansion. IEEE Trans. on <strong>Signal</strong> Processing, 46,<br />

2638–2650.<br />

Mallat, S. (1998). A Wavelet Tour of <strong>Signal</strong> Processing. Boston,<br />

MA.: Academic Press.<br />

Oppenheim, A.V., Johnson, D.H., & Steiglitz, K. (1971). Computation<br />

of spectra with unequal resolution using the Fast<br />

Fourier Trans<strong>for</strong>m. Proc. IEEE, 59, 299–301.<br />

Shapiro, J.H. (1993). Composition operators and classical function<br />

theory. Universitext: Tracts in Mathematics. New York:<br />

Springer-Verlag.<br />

Testa, I., Evangelista, G., & Cavaliere, S. (1997). A physical<br />

model of stiff strings. Proceedings of the Institute of<br />

Acoustics (Internat. Symp. on <strong>Music</strong> and Acoustics,<br />

ISMA’97), 19: Part 5, 219–224.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!