Flexible Wavelets for Music Signal Processing*
Flexible Wavelets for Music Signal Processing*
Flexible Wavelets for Music Signal Processing*
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Journal of New <strong>Music</strong> Research 0929-8215/01/3001-013$16.00<br />
2001, Vol. 30, No. 1, pp. 13–22 © Swets & Zeitlinger<br />
<strong>Flexible</strong> <strong>Wavelets</strong> <strong>for</strong> <strong>Music</strong> <strong>Signal</strong> <strong>Processing*</strong><br />
Gianpaolo Evangelista<br />
Swiss Federal Institute of Technology, Audio-Visual Communication Laboratory, Lausanne, Switzerland<br />
Abstract<br />
<strong>Music</strong>al signals require sophisticated time-frequency techniques<br />
<strong>for</strong> their representation. In the ideal case, each<br />
element of the representation is able to capture a distinct<br />
feature of the signal and can be attached either a perceptual<br />
or an objective meaning. Wavelet trans<strong>for</strong>ms constitute a<br />
remarkable advance in this field and have several advantages<br />
over Gabor expansions or short-time Fourier methods. However,<br />
application of conventional wavelet bases on musical<br />
signals produces disappointing results <strong>for</strong> at least two<br />
reasons: (1) the frequency resolution of dyadic wavelets is<br />
one-octave, too poor <strong>for</strong> any meaningful acoustic decomposition<br />
and (2) pseudoperiodicity or pitch in<strong>for</strong>mation of<br />
voiced sounds is not exploited. Fortunately, the definition of<br />
wavelet trans<strong>for</strong>m can be extended in several directions,<br />
allowing <strong>for</strong> the design of bases with arbitrary frequency resolution<br />
and <strong>for</strong> adaptation to time-varying pitch characteristic<br />
in signals with harmonic or even inharmonic structure of<br />
the frequency spectrum. In this paper we discuss refined<br />
wavelet methods that are applicable to musical signal analysis<br />
and synthesis. <strong>Flexible</strong> wavelet trans<strong>for</strong>ms are obtained<br />
by means of frequency warping techniques.<br />
Introduction<br />
* “Sound examples are available in the JNMR Electronic Appendix<br />
(EA), please see ”<br />
The analysis and processing of most features of sound requires<br />
mixed time and frequency characterization. Gabor expansions<br />
and the Short-Time Fourier Trans<strong>for</strong>m (STFT) were among<br />
the first techniques to be applied to audio signals in order to<br />
display their time-varying frequency spectrum characteristics.<br />
The derived spectrogram is still in use in many applications.<br />
While the human hearing sense is particularly gifted at classifying<br />
acoustic patterns according to both their time of occurrence<br />
and brightness of their transitory frequency spectrum,<br />
our mathematical models, having to cope with the uncertainty<br />
principle, cannot achieve indefinitely accurate resolution in<br />
both time and frequency domains. Our perception of sound<br />
is likely based on a non-linear frequency scale, as reflected<br />
by both cochlear functional models, psychoacoustic theories<br />
and conventional tone classification, e.g., tempered scale, in<br />
western music. This observed behavior is contrasting with the<br />
frequency resolution of the STFT, which is uni<strong>for</strong>m on any<br />
portion of the frequency axis. In audio signal processing the<br />
choice of the representation is crucial <strong>for</strong> deriving elegant and<br />
efficient algorithms <strong>for</strong> sound manipulation, special effects,<br />
coding, feature detection and synthesis. An important goal is<br />
to achieve a close match between our perception and the mathematical<br />
realization and efficiency of the representation.<br />
The introduction of wavelets and multiresolution approximation<br />
geared the analysis tools towards variable timefrequency<br />
resolution with the construction of integral<br />
trans<strong>for</strong>ms and expansion bases adjusted on non-uni<strong>for</strong>m<br />
time-frequency grids (Daubechies, 1992) (Mallat, 1998).<br />
Higher frequencies are represented at higher time resolution<br />
and lower frequencies at lower time resolution, while keeping<br />
the uncertainty product constant. On one hand, the integral<br />
wavelet trans<strong>for</strong>m is redundant but it does not introduce limitations<br />
on either resolution other than those dictated by the<br />
uncertainty principle. On the other hand the dyadic wavelet<br />
bases are easy to generate and fast algorithms <strong>for</strong> computing<br />
the expansion coefficients are available but they constrain<br />
the frequency resolution to one octave (Evangelista, 1991;<br />
1997). Wavelet bases based on finer rational frequency<br />
resolution factors lead to difficult design procedures in order<br />
to satisfy regularity constraints (Blu, 1998). Furthermore,<br />
the resolution factor is in practice limited to ratios of small<br />
integers. In contrast, frequency warping techniques allow<br />
<strong>for</strong> arbitrary choice of analysis bands. This is an important<br />
Accepted: 30 May, 2001<br />
Correspondence: G. Evangelista, Swiss Federal Institute of Technology, Audio-Visual Communication Laboratory, IN-LCAV Ecublens,<br />
CH-1015 Lausanne, Switzerland. E-mail: gianpaolo.evangelista@epfl.ch
14 G. Evangelista<br />
point if we want to adapt the signal representation either<br />
to perceptual scales, e.g., Bark scale, or to objective bands<br />
associated to a single feature. However, computation of<br />
the trans<strong>for</strong>m based on general warping maps is not efficient<br />
unless one is able to derive an associated discrete-time<br />
signal processing structure (Evangelista & Cavaliere,<br />
1998d).<br />
In this paper, we present an approach to flexible wavelet<br />
trans<strong>for</strong>ms where the frequency resolution of the basis elements<br />
can be designed at will. This is achieved by discretetime<br />
iterated frequency warping. We consider the important<br />
and unique case where each elementary frequency warping<br />
stage is computationally realizable with arbitrary precision.<br />
The main ingredient is the Laguerre trans<strong>for</strong>m whose computation<br />
is achieved by means of a chain of digital allpass<br />
filters or dispersive delay line. Iterated Laguerre warping is<br />
applied to wavelets leading to wavelets whose bandwidths<br />
can be assigned by selecting a set of parameters (Evangelista<br />
& Cavaliere, 1998c). Interestingly enough, these wavelets are<br />
still based on a dyadic scheme, which is a natural setting <strong>for</strong><br />
iterated band splitting.<br />
Another issue in musical signal processing is that the<br />
frequency spectrum of voiced signals peaks at uni<strong>for</strong>mly or<br />
non-uni<strong>for</strong>mly spaced frequencies. While exact periodicity<br />
is rarely encountered in real-life signals, exploitation of<br />
the pitch in<strong>for</strong>mation or of pseudoperiodicity is an asset <strong>for</strong><br />
their representation. The Fourier trans<strong>for</strong>m is particularly<br />
gifted at revealing periodicity, however the STFT represents<br />
pseudoperiodic signals as a superposition of partials whose<br />
dynamics is captured with uni<strong>for</strong>m time resolution. It follows<br />
that transients are blurred if the selected time-resolution is<br />
too poor or frequency resolution is at risk when the analysis<br />
window is too short.<br />
In previous papers (Evangelista, 1993; 1994) the author<br />
suggested methods <strong>for</strong> exploiting the pitch in<strong>for</strong>mation<br />
in wavelet representations by introducing the Pitch-<br />
Synchronous Wavelet Trans<strong>for</strong>m (PSWT). The signal is first<br />
converted into a sequence of variable-length vectors each<br />
containing the samples of one period of the signal, then the<br />
sequences of components are analyzed by means of an array<br />
of wavelet trans<strong>for</strong>ms. This representation is able to capture<br />
period-to-period fluctuations of the signal by means of basis<br />
elements that are comb-like in the frequency domain. Periodic<br />
behavior is trapped in a narrow comb adjusted on the<br />
harmonic grid and transients are represented at several scales<br />
by multiresolution basis elements whose Fourier trans<strong>for</strong>ms<br />
are organized in ensembles of sidebands of the harmonics.<br />
Faster transitions are represented, at fine time resolution, by<br />
larger bands lying far from the harmonics. Vice-versa, slow<br />
transitions and quasi-stationary dynamics are represented, at<br />
coarser time resolution, by narrower sidebands lying closer<br />
to the harmonics. In particular, this technique allows <strong>for</strong> separation<br />
of perceptually and physically distinct phenomena<br />
such as the bow noise and the harmonic resonant component<br />
in a violin tone. Importantly enough, this separation is<br />
achieved by means of a complete and orthogonal set and the<br />
sum of the components exactly reconstructs the original<br />
signal.<br />
The PSWT technique is amenable to further generalization,<br />
essentially by introducing alternate methods <strong>for</strong><br />
<strong>for</strong>ming the vector signal, e.g., based on intermediate Gaborlike<br />
or STFT representations, which can be <strong>for</strong>malized in the<br />
multiwavelets framework.<br />
While the PSWT works efficiently when the frequencies<br />
of the partials are arranged on a harmonic grid, the frequency<br />
spectrum of many sounds, such as drums and piano tones in<br />
the low register, has an inherently inharmonic structure.<br />
From a physical model point of view, this phenomenon may<br />
be explained by the fact that, when the stiffness of the material<br />
is not negligible, dispersive wave propagation occurs<br />
within the medium. Furthermore, coupled 2D modes in<br />
drums have a peculiar distribution of eigenfrequencies. A<br />
partial solution of this problem is provided by frequency<br />
warping techniques that allow <strong>for</strong> redistribution of eigenfrequencies<br />
into harmonics. Inharmonic signals are first regularized<br />
by means of a frequency map and then analyzed by<br />
means of PSWT, leading to the Frequency Warped PSWT<br />
(Evangelista & Cavaliere, 1998c).<br />
The paper is organized as follows. In section 2 the properties<br />
of one-step frequency warping maps of continuoustime<br />
signals and wavelets are examined. Methods <strong>for</strong><br />
frequency warping discrete-time signals and the Laguerre<br />
trans<strong>for</strong>m are illustrated in section 3. These methods are<br />
applied to the definition of both discrete-time and continuous-time<br />
warped wavelets in section 4.<br />
Frequency warping and wavelets<br />
Frequency warping<br />
Frequency warping is obtained by mapping the frequency<br />
axis w by means of a suitable real function W(w). Given a<br />
continuous-time signal x(t) with Fourier Trans<strong>for</strong>m (FT)<br />
we <strong>for</strong>m the warped signal<br />
i.e., we let<br />
Ú<br />
+•<br />
jw<br />
t<br />
X( w)= x() t e - dt<br />
-•<br />
1 +•<br />
jwt<br />
˜x()= t Ú X( W( w)<br />
) e dw<br />
2p -•<br />
Thus, at any frequency w – the FT of the warped signal has the<br />
same magnitude as the FT of the signal at frequency W(w – ),<br />
i.e., the frequency spectrum of the warped signal is a<br />
de<strong>for</strong>med version of that of the original signal. Clearly, <strong>for</strong><br />
the warping operation to be well defined, the map W(w) must<br />
satisfy a number of requirements. In this paper we are inter-<br />
j w t<br />
˜X( w)= X( ( w)<br />
)= x() t e - W<br />
W<br />
( )<br />
dt<br />
Ú<br />
+•<br />
-•<br />
(1)
<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 15<br />
ested in reversible warping operations defined by a strictly<br />
increasing, invertible, map W of the real axis onto itself. The<br />
map is also assumed to have odd parity: W(-w) = -W(w). In<br />
this way frequency ordering is preserved and warping a real<br />
signal yields a real signal. Notice that, as defined in Equation<br />
1, frequency warping is not energy preserving since<br />
narrow frequency bands may be mapped into broader bands<br />
with equal peak amplitude. In order to preserve energy in any<br />
frequency interval one needs to multiply the right hand side<br />
of Equation 1 by a suitable amplitude scaling function or,<br />
more generally, define a suitable measure of the frequency<br />
axis. In the sequel we will assume that the map is almost<br />
everywhere differentiable, e.g., that W(w) is absolutely continuous,<br />
and define a scaled warped operation on the signal<br />
x(t) as the one producing the FT<br />
where the derivative W¢(w) is positive almost everywhere<br />
since the map is assumed to be strictly increasing. In that case<br />
one obtains energy preservation in any band. In fact, a frequency<br />
band with support [w 0 ,w 1 ] is mapped into a frequency<br />
band with support [W -1 (w 0 ), W -1 (w 1 )] and<br />
1<br />
2p<br />
Ú<br />
From a perceptual point of view this property is important<br />
since the scaled warped signal is properly equalized while in<br />
the unscaled version some parts of the frequency spectrum<br />
are boosted. From a mathematical point of view scaled<br />
warping is associated with an orthogonal operator :<br />
with real kernel<br />
-1<br />
W ( w1)<br />
-1<br />
W ( w0<br />
)<br />
ˆX( w)= W¢( w) X( W( w)<br />
)<br />
2 1 w1<br />
2<br />
ˆX( w) dw<br />
= X w dw<br />
2p<br />
Ú ( )<br />
w0<br />
xt ˆ ()= [ x]()= t Wt ( , t) x( t)<br />
dt<br />
1 +•<br />
j( wt- W( w)<br />
t)<br />
Wt ( ,t)= ¢( w) e dw<br />
2p<br />
Ú W<br />
-•<br />
and inverse operator -1 defined by the kernel<br />
Ú<br />
W - 1<br />
( t, t )= W ( t,<br />
t )<br />
Finally, we remark that, by exploiting duality of the FT one<br />
can define time warping operations in a similar fashion.<br />
Simple warped wavelets<br />
In (Baraniuk & Jones, 1993; 1995) unitary warping operators<br />
were exploited to define warped wavelets. The construction<br />
may be summarized as follows: the signal is<br />
unwarped by means of the inverse warping operator -1 ,<br />
then the expansion coefficients on a dyadic wavelet basis<br />
are computed. Reconstruction is achieved by applying the<br />
warping operator to the wavelet expansion of the<br />
unwarped signal. In other words, given an orthogonal and<br />
complete dyadic wavelet set<br />
+•<br />
-•<br />
(2)<br />
where<br />
y<br />
one computes the expansion coefficients<br />
-<br />
xˆ = 1 x,<br />
y<br />
and reconstructs the signal as follows<br />
xt ()= Â xˆ nm , [ y<br />
nm , ]() t<br />
Since the frequency warping operator is orthogonal we have<br />
-1<br />
xˆ = x, y = x,<br />
y<br />
nm , nm , nm ,<br />
and Equation 4 may be considered as the expansion on the<br />
warped wavelet basis<br />
Indeed this basis is orthogonal since<br />
-1<br />
y , y = y , y = d d<br />
n¢ , m¢ n, m n¢ , m¢<br />
n, m n¢ , n m¢<br />
, m<br />
and complete in view of unitary equivalence.<br />
The FT of the warped wavelets<br />
ˆ<br />
nm , t y<br />
nm ,<br />
y<br />
{ ()} Œ<br />
-n<br />
2 -n<br />
()= t 2 y 2 t-m<br />
nm , 00 ,<br />
nm , nm ,<br />
{[ y nm , ]() t }<br />
nm , Œ<br />
is related to the FT of the dyadic wavelets as follows<br />
ˆ<br />
nm , ( w)= ¢( w) nm , ( ( w)<br />
)<br />
n<br />
n n j2<br />
mW<br />
w<br />
= 2 W¢( w) Y00<br />
, 2 W( w)<br />
e<br />
Y W Y W<br />
from which we see that warped wavelets are not simply generated<br />
by dilating and translating a mother wavelet as in<br />
dyadic wavelets (Eq. 3). Rather, the “translated wavelets” are<br />
generated by generally non-rational allpass filtering e -j2n mW(w) .<br />
Scaling depends on the warping map W(w) as well. However,<br />
frequency warped wavelets have the remarkable property that<br />
their frequency resolution, i.e., their essential frequency<br />
support, may be arbitrarily assigned by proper choice of the<br />
map W(w). Indeed, if the cut-off frequencies of dyadic<br />
wavelets are fixed at 2 -n p, the cut-off frequencies of the<br />
warped wavelets are<br />
w<br />
y nm , t<br />
nm ,<br />
nm , Œ<br />
()∫ [ ]() t<br />
n<br />
- -n<br />
= W 1 ( 2 p)<br />
For our purposes we just need to add the remark that genuine<br />
scale a wavelets, 0 < a < 1, may be generated by selecting as<br />
warping map a function W(w) satisfying the following property<br />
(a-homogeneity):<br />
<br />
1<br />
W( aw)= W( w)<br />
2<br />
( )<br />
( ) - ( )<br />
(3)<br />
(4)<br />
(5)<br />
(6)
16 G. Evangelista<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
0 0.5 1 1.5 2 2.5 3<br />
Fig. 1. Example of warping map <strong>for</strong> power of a wavelet cut-off<br />
choice.<br />
such as<br />
reported in Figure 1. Repeated application of Equation 6<br />
shows that<br />
By deriving both sides of Equation 8 with respect to w one<br />
obtains<br />
By substituting Equations 8 and 9 in Equation 5 we<br />
obtain<br />
Yˆ<br />
which shows that warped wavelets generated by an a-homogeneous<br />
warping map are obtained by dilating the family of<br />
mother wavelets Ŷ 0,m (w), m Œ , while wavelets at fixed scale<br />
level n are allpass related:<br />
Yˆ<br />
W<br />
W( w)=<br />
pw<br />
w<br />
W( a -n<br />
w)= 2<br />
n W( w)<br />
-n<br />
n<br />
¢( a w)= ( 2a) W¢( w)<br />
-n<br />
Yˆ<br />
n<br />
( w)= a<br />
2 -<br />
( a w)<br />
nm , 0,<br />
m<br />
n<br />
jmW<br />
a w<br />
( w)= Yˆ<br />
( w) e<br />
- ( -<br />
)<br />
nm , n,<br />
0<br />
-<br />
2 log a<br />
Without loss of generality we can assume that<br />
W(p) = p<br />
so that the cut-off frequencies of the scale a warped wavelets<br />
are fixed at<br />
w n = a n p<br />
w<br />
p<br />
(7)<br />
(8)<br />
(9)<br />
If a = 1 – 2 then the map (Eq. 7) reverts to the identical map<br />
W(w) = w and the warped wavelets coincide with the<br />
dyadic wavelets. If a > 1 – 2 then the warped wavelets achieve a<br />
finer frequency resolution than the dyadic wavelets.<br />
Discrete-time frequency warping and<br />
Laguerre trans<strong>for</strong>m<br />
The continuous-time warped wavelet expansion illustrated in<br />
the previous section is applicable to continuous-time signals<br />
and requires warping of the signal, a task that is computationally<br />
difficult to per<strong>for</strong>m. An important property of dyadic<br />
wavelet expansions is that they have a fast algorithm to<br />
iteratively compute the expansion coefficients even in the<br />
continuous-time case. In this section we consider frequency<br />
warping of discrete-time signals and relate this to an orthogonal<br />
trans<strong>for</strong>m that can be computed by means of a chain of<br />
rational allpass filters.<br />
Since the Discrete-Time Fourier Trans<strong>for</strong>m (DTFT) of a<br />
sequence x(n) is periodic, reversible frequency warping of<br />
discrete-time signals requires an invertible frequency map of<br />
the interval [-p, p] onto itself. As we did in the continuoustime<br />
case, we will assume that q (w) is almost everywhere<br />
differentiable, has odd parity and maps the point p into itself.<br />
As discussed in the previous section, we are interested in procedures<br />
<strong>for</strong> unwarping the signal, producing the signal xˆ(n)<br />
whose DTFT is<br />
It follows that the discrete-time unwarping operator -1 is<br />
orthogonal and has kernel<br />
-1<br />
+ d<br />
-<br />
-1<br />
1 p q<br />
1<br />
jn ( w- kq ( w)<br />
)<br />
W ( n,<br />
k)=<br />
Ú e dw<br />
2p<br />
-p<br />
dw<br />
1 + p<br />
jk ( w- nq( w)<br />
)<br />
= Ú q¢( w)<br />
e dw<br />
2p<br />
-p<br />
The unwarped signal is<br />
xn ˆ ()= [ - 1 -<br />
x]()= n W 1 ( nk , ) xk ( )= l ( kxk ) ( )<br />
where we defined<br />
(10)<br />
(11)<br />
(12)<br />
Notice that Equation 11 is in the <strong>for</strong>m of a scalar product of<br />
the signal with the sequence l n (k). By the orthogonality of<br />
the operator<br />
and the set {l n (k)} nŒ is orthogonal and complete. In fact:<br />
and<br />
Â<br />
k<br />
-1<br />
dq<br />
-1<br />
ˆX ( w)= X q ( w)<br />
dw<br />
Â<br />
k<br />
-1<br />
l n ()∫ k W ( n,<br />
k)<br />
W - 1<br />
( n, k )= W ( k,<br />
n )<br />
Â<br />
( )<br />
Â<br />
-1<br />
l () k l ()= k W ( n, k) W( k, n¢<br />
)= d ,<br />
n n¢<br />
n n¢<br />
k<br />
k<br />
n
<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 17<br />
Â<br />
n<br />
Â<br />
-1<br />
l () k l ( k¢<br />
)= W ( n, k) W( k¢<br />
, n)=<br />
d ,<br />
n n k k¢<br />
k<br />
Thus, the discrete-time unwarped signal is given by the<br />
sequence of expansion coefficients<br />
xn ˆ ()= x,<br />
ln<br />
(13)<br />
of the signal over the orthogonal basis {l n (k)} nŒ and the<br />
signal x(n) is recovered from xˆ(n) by the expansion series<br />
Fig. 2. Structure <strong>for</strong> computing discrete-time signal unwarping by<br />
means of the Laguerre trans<strong>for</strong>m.<br />
xn ()= Â xk ˆ () l k () n<br />
k<br />
In turn, if the signal is causal, i.e., if x(k) = 0 <strong>for</strong> k < 0,<br />
then the scalar product (Eq. 13) may be computed by convolving<br />
the time-reversed signal y(-k) = x(k) by l n (k) and<br />
sampling the result at n = 0:<br />
•<br />
•<br />
x,ln = Â x( k) ln( k)= Â y( 0 -k) ln( k)<br />
k = 0<br />
k = 0<br />
Furthermore, from Equations 12 and 2 it is easy to recognize<br />
that the DTFT of l n (k) satisfies the following recurrence:<br />
3<br />
2.5<br />
2<br />
1.5<br />
b =.9<br />
with<br />
Hence<br />
L<br />
n<br />
( w)= L ( w)<br />
e<br />
n-1<br />
L 0 ( w)= q¢( w)<br />
- jqw<br />
( )<br />
(14)<br />
1<br />
0.5<br />
b =-.9<br />
L<br />
n<br />
jnqw<br />
( w)= L ( w) e<br />
- ( )<br />
0<br />
where the term e -jnq(w) can be assimilated to the response of<br />
an allpass filter. However, <strong>for</strong> digital implementations one<br />
needs to constrain -q(w) to be the phase of a causal and<br />
stable, rational allpass filter. For reversible warping we need<br />
to constrain q (w) to a one-to-one mapping of the normalized<br />
frequency interval [-p, +p]. Furthermore, we want to map a<br />
real signal into a real warped signal, thus q(w) must have<br />
odd parity. It is easy to show that these conditions are verified<br />
if and only if -q(w) is the phase of a first-order real<br />
allpass filter with transfer function<br />
-1<br />
z - b<br />
Az ()= , - 1< b < 1<br />
-1<br />
1 - bz<br />
The corresponding basis set l n (k) is recognized to be the<br />
discrete Laguerre basis (Broome, 1965) (Evangelista &<br />
Cavaliere, 1998d) (Oppenheim et al., 1971), orthogonal and<br />
complete over the set of finite energy causal signals, with<br />
b sinw<br />
qw ( )= w+<br />
2 arctan , - 1< b < 1 (15)<br />
1 - b cosw<br />
This one-parameter family of warping curves (Eq. 15) is<br />
shown in Figure 3. In this family, <strong>for</strong> any allowed value of<br />
the parameter b, the curves corresponding to b and to -b are<br />
inverse of each other. The curve <strong>for</strong> b = 0 is the identical map<br />
q(w) = w. Furthermore,<br />
0<br />
0 0.5 1 1.5 2 2.5 3<br />
Fig. 3. Family of warping curves associated to the Laguerre<br />
trans<strong>for</strong>m.<br />
2<br />
1 - b<br />
q¢( w)=<br />
1- 2bcosw<br />
+ b<br />
In computing the Laguerre trans<strong>for</strong>m of a causal signal, the<br />
function in Equation 14 can be replaced by a causal realization<br />
of q¢( w)<br />
, namely<br />
1 - b<br />
1 - be j<br />
2<br />
L 0 ( w)=<br />
- w<br />
Laguerre warping is the unique one-to-one warping that can<br />
be numerically computed by means of rational filters. A<br />
digital structure implementing Laguerre warping is shown in<br />
Figure 2. There, the input signal is time reversed, filtered by<br />
L 0 (w) and fed to a dispersive delay line consisting of a chain<br />
of allpass filters The warped signal is obtained by collecting<br />
the samples present at the output of each filter at time n = 0.<br />
Although the computational structure in principle requires<br />
an infinite number of filter sections, by a simple argument<br />
on the maximum group delay, one can show (Evangelista<br />
& Cavaliere, 1998d) that, to a good approximation, the<br />
2
18 G. Evangelista<br />
3<br />
2.5<br />
2<br />
Fig. 5.<br />
Two-channel warped filter bank structure.<br />
1.5<br />
1<br />
0.5<br />
0<br />
0 0.5 1 1.5 2 2.5 3<br />
Fig. 6.<br />
unwarped downsampler<br />
Equivalent downsampled warped filtering structures.<br />
Fig. 4. Frequency warping harmonic partials to obtain inharmonic<br />
partials.<br />
Laguerre trans<strong>for</strong>m of a finite-length N signal can be computed<br />
by means of<br />
1 + b<br />
M ª N<br />
1 - b<br />
allpass sections. As already observed, the inverse Laguerre<br />
trans<strong>for</strong>m can be computed using the same structure of<br />
Figure 2 provided that we reverse the sign of the Laguerre<br />
parameter b.<br />
Frequency warping is per se an interesting effect from a<br />
musical point of view. In fact, by frequency warping a periodic<br />
signal one obtains an inharmonic distribution of the partials,<br />
as shown in Figure 4, typical of bells, drums or piano<br />
tones in the low-register. There<strong>for</strong>e, the frequency spectrum<br />
of an initial sound is morphed into another sound, possibly<br />
belonging to a different instrument class. Time-varying<br />
versions of the warping algorithm have been devised<br />
(Evangelista, 2000b). Further approximations via overlapadd<br />
of the Short-Time Laguerre Trans<strong>for</strong>m allow us to compute<br />
the effect in real time (Evangelista, 2000b). Sound<br />
examples obtained by frequency warping can be visited at<br />
http://lcavwww.epfl.ch/~gianevan/research/fwex.html.<br />
Warped filter banks and iterated<br />
warped wavelets<br />
Two-channel warped filter bank<br />
In the classical construction of dyadic wavelets (Daubechies,<br />
1992) a two-channel critically sampled filter bank serves as<br />
building block. This is based on two quadrature mirror filters<br />
(QMF) H(z), low-pass, and G(z), high-pass. These filters are<br />
half-band with cut-off frequency w c = p – 2<br />
. In (Evangelista &<br />
Cavaliere, 1998d) the structure of Figure 5 was considered<br />
as a building block. There, the signal is frequency unwarped<br />
by means of Laguerre trans<strong>for</strong>m (LT) prior to filtering. The<br />
operators [Ø] and [≠] denote, respectively, downsampling<br />
(decimation) by 2 and upsampling (zero-insertion) by 2.<br />
Unwarping the signal is equivalent to warping the filters and<br />
to unwarping the downsampling operators (Evangelista &<br />
Cavaliere, 1998d), as depicted in Figure 6. Thus, on one<br />
hand the pass-band of the filters is altered and the cut-off<br />
frequency is moved to w c = q -1 ( p – 2<br />
), on the other hand the<br />
equivalent unwarped downsampling operator, which embeds<br />
unwarping and conventional downsampling, resets the band<br />
of the filtered signal to half-band prior to downsampling. By<br />
combining the Laguerre trans<strong>for</strong>m with an orthogonal filter<br />
bank one obtains an orthogonal perfect reconstruction structure<br />
whose bands can be arbitrarily allocated by fixing the<br />
parameter b.<br />
The structure of Figure 5 can be generalized to include an<br />
ordinary discrete wavelet trans<strong>for</strong>m structure in place of the<br />
two-channel filter bank. This provides a first <strong>for</strong>m of discretetime<br />
warped wavelets. However, the analysis bands cannot<br />
be arbitrarily selected since they are determined by trans<strong>for</strong>ming<br />
the cut-off frequencies 2 -n p according to a single<br />
curve in the Laguerre family. As shown in the next section,<br />
fully arbitrary bandwidth allocation is achieved by iterating<br />
the full warped filter bank structure.<br />
Frequency warping may be successfully employed <strong>for</strong><br />
trans<strong>for</strong>ming inharmonic sounds into harmonic ones or viceversa.<br />
When used in conjunction with the PSWT one obtains<br />
the Pitch-Synchronous Frequency Warped Wavelet Trans<strong>for</strong>m<br />
(PSFWWT) useful <strong>for</strong> decomposing inharmonic instrument<br />
sounds into regular pseudo-periodic components plus
<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 19<br />
...<br />
Fig. 9. Synthesis filter bank <strong>for</strong> computing the inverse<br />
discrete-time frequency warped wavelet trans<strong>for</strong>m.<br />
jmWn<br />
Ynm , ( w)= Ln, ( Wn ( w)<br />
)<br />
Ê 1<br />
G Wn( w)<br />
ˆ<br />
0 -1 Fn-10<br />
, ( w)<br />
e<br />
Ë 2 ¯<br />
where<br />
n<br />
1<br />
Fn, 0( Ï<br />
w)= Lk,<br />
0( Wk 1( w)<br />
)<br />
Ê<br />
H Wk( w)<br />
ˆ¸<br />
’ Ì<br />
-<br />
Ó<br />
Ë 2 ¯<br />
˝<br />
˛<br />
k = 1<br />
is the DTFT of the warped scaling sequence,<br />
- ( w )<br />
(16)<br />
(17)<br />
Fig. 7. Frequency domain characteristics of pitchsynchronous frequency<br />
warped wavelets.<br />
Fig. 8. Analysis filter bank <strong>for</strong> computing the discrete-time frequency<br />
warped wavelet trans<strong>for</strong>m.<br />
fluctuations. The magnitude FT of the basis elements is<br />
reported in Figure 7. The PSFWWT decomposition allows,<br />
e.g., to resolve the hammer noise in a piano tone. An account<br />
of this method is given in (Evangelista & Cavaliere, 1998c)<br />
(Testa et al., 1997), where the Laguerre warping family is seen<br />
to close match the dispersion curves provided by both experimental<br />
data and physical model based on a 4th order PDE.<br />
Discrete-time warped wavelets with arbitrary<br />
band allocation<br />
A discrete-time warped wavelet decomposition may be<br />
obtained by iterating the two-channel warped filter bank.<br />
Similarly to the ordinary wavelet trans<strong>for</strong>m, identical warped<br />
filter bank structures are nested in the low-pass branch. The<br />
corresponding analysis and synthesis structures are reported<br />
in Figures 8 and 9, respectively. Each warping stage is characterized<br />
by a different value b k of the Laguerre parameter.<br />
The warped wavelets are the impulse responses of the lower<br />
branches of the synthesis structure <strong>for</strong> any value of the shift.<br />
One can show (Evangelista & Cavaliere, 1998c) that the<br />
DTFT of the level n warped wavelet has the <strong>for</strong>m<br />
...<br />
2<br />
1-<br />
bk<br />
L k ,0( w)=<br />
- jw<br />
1-<br />
be k<br />
is a causal realization of qk¢( w)<br />
and<br />
W k ( w)= J k oJ k - o...<br />
1 oJ1( w)<br />
(18)<br />
with<br />
Jk( w)= 2qk( w)<br />
Notice that, due to the presence of the frequency dependent<br />
delay term e -jmW n(w)<br />
in Equation 16, wavelets pertaining to the<br />
same level n are obtained from each other by an allpass filtering<br />
operation, rather than by elementary time shift as in<br />
ordinary dyadic wavelets. The cut-off frequency of the level<br />
n warped scaling sequence is given by<br />
-<br />
wn<br />
= W 1<br />
n ( p)<br />
which is the cut-off frequency of the narrowest band filter<br />
H( – 1 2 W n (w)) in the product (Eq. 17).<br />
Given a choice of the cut-off frequencies w n < w n-1 < ...<br />
< w 1 , there exists a unique solution <strong>for</strong> the Laguerre parameters<br />
b k , k = 1,..., n, obtained by the following recurrence<br />
Ê p w1<br />
ˆ<br />
b1<br />
= tan -<br />
Ë 4 2 ¯<br />
Ê p Wk-1( wk)<br />
ˆ<br />
b k = tan -<br />
(19)<br />
Ë 4 2 ¯<br />
Consequently, to any choice of the cut-off frequencies is<br />
uniquely associated a corresponding set of parameters. The<br />
iterated warped filter bank is a perfect reconstruction orthogonal<br />
structure equivalently described by a discrete-time<br />
warped wavelet basis. Cut-off frequencies may be arranged<br />
on a perceptual scale, such as Bark scale, leading to perceptually<br />
organized orthogonal trans<strong>for</strong>ms (Evangelista &<br />
Cavaliere, 1998a; 1998b). The example reported in Figure<br />
10 displays the magnitude Fourier trans<strong>for</strong>m of frequency<br />
warped wavelets and scaling sequence achieving 1/3 of<br />
octave frequency resolution. Sound examples of ordinary
20 G. Evangelista<br />
7.7<br />
6.6<br />
5.5<br />
4.4<br />
3.3<br />
2.2<br />
1.1<br />
0<br />
0 0.5 1 1.5 2 2.5 3<br />
Fig. 10. Magnitude Fourier trans<strong>for</strong>m of 1/3-octave frequency<br />
warped wavelets and scaling sequence obtained by letting w n = a n p,<br />
with a = 2 - – 1 3<br />
.<br />
wavelet and frequency warped wavelet analysis can be found<br />
at http://lcavwww.epfl.ch/~gianevan/research/fwex.html.<br />
Continuous-time warped wavelets<br />
Expansion coefficients over ordinary continuous-time<br />
wavelet bases may be iteratively computed by means of an<br />
infinite cascade of two-channel filter banks. The discretetime<br />
frequency warped wavelets illustrated in the previous<br />
section can be extended to continuous-time bases. This<br />
extension requires generalization of many of the mathematical<br />
results illustrated in Daubechies (1992) and Mallat<br />
(1998) The scaling function of ordinary dyadic wavelets is<br />
obtained by means of a limit process involving rescaling of<br />
the discrete-time scaling function:<br />
where F0,0(w) c.t. and Fn,0(w) d.t. respectively denote the Fourier<br />
trans<strong>for</strong>m of the level 0 continuous-time scaling function and<br />
the DTFT of the level n scaling sequence (discrete-time).<br />
Since (Mallat 1998)<br />
then<br />
F<br />
( w)=<br />
lim<br />
n-1<br />
dt ..<br />
F 00 , w ’<br />
k = 0<br />
k<br />
( )= H( 2 w)<br />
•<br />
ct ..<br />
F 00 , ( w)=<br />
’<br />
k = 1<br />
1 dt .. Ê w ˆ<br />
F , n<br />
2 Ë 2 ¯<br />
ct ..<br />
00 ,<br />
nƕ<br />
n n 0<br />
Ê w<br />
H<br />
Ë<br />
k<br />
2<br />
2<br />
ˆ<br />
¯<br />
The iterated frequency warped counterpart of this construction<br />
is much more complex and requires a deep study of<br />
properties the iterated map (Eq. 18) and of the convergence<br />
of the parameter selection algorithm (Eq. 19). One can<br />
show that, in spite of the intricate analytic <strong>for</strong>m of the<br />
iterates W k (w), en<strong>for</strong>cing the following exponential cut-off<br />
condition:<br />
w n = a n p, n = 1, 2, . . .<br />
guarantees that the parameter sequence b k obtained by Equation<br />
10, simply converges to<br />
At the same time, we have<br />
where G -1 (w) is an increasing and continuously differentiable<br />
eigenfunction of the composition-by-J •<br />
-1<br />
operator with<br />
eigenvalue a, i.e.,<br />
G -1 ° J • -1 = aG -1 (20)<br />
Since, under general assumptions, the eigenfunction of the<br />
composition operator is unique up to a multiplicative constant<br />
(Shapiro, 1993) (Evangelista, 2000a) and since G -1 (p)<br />
= p, then<br />
-1<br />
G - 1 pg n ( w)<br />
( w)=<br />
lim<br />
n Æ• -1<br />
g ( p)<br />
where<br />
Thus, G -1 (w) is also proportional to the limit iterate of the<br />
map J • -1 . Furthermore, from Equation 20 we obtain<br />
and<br />
-1<br />
Wn<br />
( w) -<br />
lim<br />
nƕ<br />
n =<br />
1<br />
G ( w )<br />
a<br />
-1 -1 -1<br />
g n ( w)= J•<br />
oLoJ•<br />
( w)<br />
1442443<br />
-<br />
J 1 -<br />
• ( w)= G aG<br />
1 ( w)<br />
-<br />
g 1 n -<br />
( w)= G a G<br />
1 ( w)<br />
n<br />
1 a<br />
b• = - 2<br />
1+<br />
2 a<br />
.<br />
n times<br />
( )<br />
( )<br />
This allows us to define the Fourier trans<strong>for</strong>m of the<br />
continuous-time frequency warped scaling function as<br />
follows:<br />
Ê 1<br />
H g<br />
Under rather general assumptions, this construction leads<br />
to the definition of asymptotically scale a continuous-time<br />
warped wavelets. The expansion coefficients over the resulting<br />
basis can be iteratively computed by means of the struc-<br />
-1<br />
•<br />
k<br />
˜ ct ..<br />
Ë 2<br />
F 00 , ( w)=<br />
’<br />
k = 0 2<br />
( w)<br />
ˆ<br />
¯<br />
n
<strong>Flexible</strong> wavelets <strong>for</strong> music signal processing 21<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
0<br />
0 20 40 60 80 100 120<br />
Fig. 11.<br />
Tiling the time-frequency plane by means of frequency warped wavelets.<br />
ture in Figure 8. For more details on this extension, the<br />
mathematically inclined reader is referred to (Evangelista,<br />
2000a).<br />
The warped wavelet decomposition leads to an unconventional<br />
tiling of the time-frequency plane shown in Figure<br />
11. Due to the frequency dependent delay of each basis<br />
element, the time-frequency localization zones are characterized<br />
by curved boundaries.<br />
Conclusions<br />
In this paper we explored the panorama of wavelet trans<strong>for</strong>ms<br />
obtained by means of frequency warping techniques,<br />
with particular attention to their realizability in computationally<br />
efficient structures based on rational discrete-time<br />
allpass filters and two-channel filter banks. The flexibility of<br />
the trans<strong>for</strong>m is exploited <strong>for</strong> the representation of music<br />
signals in perceptually or objectively meaningful components,<br />
allowing <strong>for</strong> adaptation to perceptual scales or <strong>for</strong><br />
orthogonal separation of transients and noise in harmonic or<br />
inharmonic sounds via pitch-synchronous methods. The<br />
refinable frequency resolution of frequency warped wavelets<br />
is a fundamental characteristic <strong>for</strong> the analysis of any musical<br />
sound.<br />
Frequency warping emerges as an appealing technique in<br />
musical signal processing <strong>for</strong> the creation of new effects and<br />
<strong>for</strong> the analysis and synthesis of features of sounds by means<br />
of adapted representations.<br />
References<br />
Baraniuk, R.G. & Jones, D.L. (1993). Warped wavelets<br />
bases: Unitary equivalence and signal processing. Proc.<br />
ICASSP’93, IEEE, III, 320–323.<br />
Baraniuk, R.G. & Jones, D.L. (1995). Unitary equivalence: A<br />
new twist on signal processing. IEEE Trans. <strong>Signal</strong> Processing,<br />
43, 2269–2282.<br />
Blu, T. (1998). A new design algorithm <strong>for</strong> two-band orthonormal<br />
filter banks and orthonormal rational wavelets. IEEE<br />
Trans. On <strong>Signal</strong> Processing, 46, 1494–1504.<br />
Broome, P.W. (1965). Discrete orthonormal sequences. J. Assoc.<br />
Comput. Machinery, 12, 151–168.<br />
Daubechies, I. (1992). Ten Lectures on <strong>Wavelets</strong>. CBMS-<br />
NSF Reg. Conf. Series in Appl. Math. Philadelphia, PA.:<br />
SIAM.<br />
Evangelista, G. (1991). Wavelet trans<strong>for</strong>ms that we can play. In:<br />
G. De Poli, A. Piccialli, & C. Roads (Eds.), Representations<br />
of <strong>Music</strong>al <strong>Signal</strong>s (pp. 119–136). Cambridge, MA: MIT<br />
Press.<br />
Evangelista, G. (1993). Pitch synchronous wavelet representations<br />
of speech and music signals. IEEE Trans. on <strong>Signal</strong><br />
Processing, 41, 3313–3330. Special issue on <strong>Wavelets</strong> and<br />
<strong>Signal</strong> Processing.<br />
Evangelista, G. (1994). Comb and multiplexed wavelet trans<strong>for</strong>ms<br />
and their applications to signal processing. IEEE<br />
Trans. on <strong>Signal</strong> Processing, 42, 292–303.<br />
Evangelista, G. (1997). Wavelet representations of musical<br />
signals. In: C. Roads, S.T. Pope, A. Piccialli, & G. De Poli
22 G. Evangelista<br />
(Eds.), <strong>Music</strong>al <strong>Signal</strong> Processing (pp. 127–153).<br />
Amsterdam: Swets & Zeitlinger.<br />
Evangelista, G. (2000b). Time and frequency warping of<br />
musical signals. In: COST-G6 (Ed.), Digital Audio Effects<br />
(DAFx) (in preparation). Chichester, England: Wiley.<br />
Evangelista, G. (2001). Dyadic warped wavelets. Advances in<br />
Imaging and Electron Physics, 117, 73–171.<br />
Evangelista, G. & Cavaliere, S. (1998a). Arbitrary bandwidth<br />
wavelet sets. Proc. of ICASSP’98, III, 1801–1804.<br />
Evangelista, G. & Cavaliere, S. (1998b). Auditory modeling via<br />
frequency warped wavelet trans<strong>for</strong>m. Proc. of EUSIPCO98,<br />
I, 117–120.<br />
Evangelista, G. & Cavaliere, S. (1998c). Discrete frequency<br />
warped wavelets: Theory and applications. IEEE Trans. on<br />
<strong>Signal</strong> Processing, 46, 874–885. Special issue on Theory and<br />
Applications of Filter Banks and <strong>Wavelets</strong>.<br />
Evangelista, G. & Cavaliere, S. (1998d). Frequency warped filter<br />
banks and wavelet trans<strong>for</strong>ms: A discrete-time approach via<br />
Laguerre expansion. IEEE Trans. on <strong>Signal</strong> Processing, 46,<br />
2638–2650.<br />
Mallat, S. (1998). A Wavelet Tour of <strong>Signal</strong> Processing. Boston,<br />
MA.: Academic Press.<br />
Oppenheim, A.V., Johnson, D.H., & Steiglitz, K. (1971). Computation<br />
of spectra with unequal resolution using the Fast<br />
Fourier Trans<strong>for</strong>m. Proc. IEEE, 59, 299–301.<br />
Shapiro, J.H. (1993). Composition operators and classical function<br />
theory. Universitext: Tracts in Mathematics. New York:<br />
Springer-Verlag.<br />
Testa, I., Evangelista, G., & Cavaliere, S. (1997). A physical<br />
model of stiff strings. Proceedings of the Institute of<br />
Acoustics (Internat. Symp. on <strong>Music</strong> and Acoustics,<br />
ISMA’97), 19: Part 5, 219–224.