Denoising and Analysis of 2D NMR Spectra for Metabolomic ...

Denoising and Analysis of 2D NMR Spectra for 

Metabolomic Profiling Studies 

Simon Poulding 

Dissertation submitted for the MSc in Mathematics with Modern Applications, 

Department of Mathematics, University of York, UK. 

August 2006

Denoising and Analysis of 2D NMR Spectra for Metabolomic Profiling Studies 

Acknowledgements 

I would like to acknowledge the support and assistance of the following: Dr Julie Wilson for 

her extensive input, suggestions and comments, as well as the provision of hardware and software; 

Dr Adrian Charlton and Dr James Donarski of the Central Science Laboratories for the acquisition 

and provision of NMR data sets, demonstrating the principles of NMR spectroscopy, assisting with 

the use of Bruker Topspin software, and commenting on the objectives and results; and Dr Jason 

Levesley for his guidance on the content and structure of the dissertation. 

1


Contents 

Acknowledgements 1 

1. Introduction 4 

1.1. Project Objectives 4 

1.2. Document Structure 4 

2. Statistical Analysis and Reduction of t1-Noise 5 

2.1. The Phase-Cycled HSQC Experiment and Sources of t1-Noise 5 

2.2. Initial Analysis 6 

2.3. Noise Separation 9 

2.4. Correlation of t1-Noise Traces 15 

2.5. Complex Correlation of t1-Noise Traces 17 

2.6. Denoising Algorithm 24 

2.7. Results and Discussion 26 

2.8. Comparison to Other t1-Noise Reduction Techniques 31 

3. Automated Peak Picking Using a Genetic Algorithm 33 

3.1. Peak Shape 33 

3.2. Peak Width 33 

3.3. Peak Fit Metric 34 

3.4. A Priori Knowledge Encapsulated in the Genetic Algorithm 34 

3.5. Suitability of Genetic Algorithms As The Optimisation Technique 36 

3.6. Identification of Convoluted Peak Regions 38 

3.7. Genetic Algorithm Representation, Operators and Objective Function 38 

3.8. Technical Implementation 41 


4. Combined Denoising and Peak Picking Process 46 

4.1. Implementation Overview 46 

4.2. Processing Steps 46 


5. Two-Dimensional Adaptive Binning 52 

5.1. Overview of One-Dimensional Adaptive Binning 52 

5.2. Objective for Two-Dimensional Adaptive Binning Research 52 

5.3. Two-Dimensional Adaptive Binning Method 52 


6. Conclusion 57 

6.1. Evaluation of Project Objectives 57 

6.2. Further Investigation 57 

Appendix A. Pulse Fourier Transform NMR 59 

A.1. Nuclear Magnetic Moment 59 

A.2. Pulse NMR 60 

A.3. Relaxation 62 

A.4. Chemical Shift 62 

A.5. Spin-Spin Coupling 63 

A.6. Signal Detection and Processing 63 

A.7. Multi-Dimensional NMR 65 

A.8. NMR Sensitivity 67 

Appendix B. Wavelet Analysis 69 

B.1. Continuous Wavelet Transform 69 

B.2. Discrete Wavelet Transform 71 

B.3. Scaling Functions 74 

B.4. Fast Wavelet Transform 75 

B.5. Pyramid Algorithm 77 

B.6. Wavelet Construction and Families 80 

B.7. Denoising and Smoothing 82 

2


B.8. Non-Decimating (Translation Invariant) Transform 84 

B.9. Two-Dimensional Discrete Wavelet Transforms 85 

Appendix C. Genetic Algorithm Overview 87 

C.1. Evolutionary Algorithms 87 

C.2. Steady-State Genetic Algorithms 87 

C.3. Representation 88 

C.4. Operators 88 

C.5. Objective Function 88 

Appendix D. Experimental Methods 89 

Appendix E. Code Structure 90 

E.1. Denoising and Peak Picking 90 

E.2. Two-Dimensional Adaptive Binning 91 

References 92 

3


1. Introduction 

1.1. Project Objectives. The analysis of the metabolome—the set of the chemical compounds, 

or metabolites, synthesised by a cell[10]—can provide extremely useful information about biological 

samples. For example, the comparison of metabolic profiles can elucidate information about gene 

function[19], distinguish samples from different genetic lines[6, 7], and identify marker metabolites 

for disease states[3]. 

Nuclear Magnetic Resonance (NMR) is one of the techniques used for analysing the metabolome. 

It measures the magnetic resonance frequencies of particular nuclei in a sample, and since the 

resonance frequency is modified by the chemical environment of the nucleus in question[9], each 

metabolite has a different, yet characteristic, set of resonance frequencies. The NMR spectrum of 

a sample thus provides information about its metabolic profile. 

Two-dimensional NMR experiments analyse the relationship between two different nuclei in 

metabolites, and the extra dimensionality in the data, compared to one-dimensional experiments, 

can further distinguish metabolites. Since metabolites are organic molecules, an appropriate twodimensional 

NMR experiment for metabolomic profiling is 1 H– 13 C Heteronuclear Single Quantum 

Coherence (HSQC): it identifies hydrogen and carbon atoms connected by a single bond. 

However, the most sensitive type of HSQC experiments—suitable for detecting the very low 

concentrations of some compounds in metabolomic samples—suffer from artefacts called t1-noise 

that can obscure some of the peaks in the sample and hinder peak identification[22, 17]. In 

addition, the process of picking peaks in the spectrum, and especially of distinguishing small 

peaks from both the t1-noise and general noise, is relatively manual and relies on the knowledge of 

the experimenter. This makes the process time-consuming and open to subjective interpretation. 

These problems motivate the first two objectives of the project. Firstly, to explore techniques 

for the reduction of the t1-noise through analysis of the 2D NMR data set. The second is to 

automate the picking of peaks and distinguishing peaks from noise artefacts. 

A further objective concerns the comparison of metabolic profiles. The frequencies of peaks in 

an NMR spectrum can change depending on factors such as the pH or temperature of the sample, 

and the shift in frequency is different for each peak[14]. Therefore, direct comparison of spectral 

peaks by matching frequency coordinates is not appropriate. One alternative is to use ‘binning’: 

the spectrum is partitioned into equal-sized intervals and the total spectral intensity within each 

bin is calculated. Profiles are then compared using the total intensities for corresponding bins, 

with assumption that shifted peaks are contained within the same bin (for example, as used in 

[11]). However, binning does not take account of the actual distribution of peaks in the spectrum, 

and so, for example, the shifting of peaks located on the bin boundaries can limit its effectiveness. 

A refinement of binning, termed ‘adaptive binning’ and described in [6], has proved successful for 

the comparison of one-dimensional NMR spectra of metabolic samples. This method, leveraging 

wavelet techniques, assigns both bin location and size based on the distribution of peaks across 

a number of experimental spectra. The third objective in this project is to assess the use of this 

technique on two-dimensional NMR spectra. 

1.2. Document Structure. Subsequent sections of this document describe: 

• the reduction of t1-noise by the statistical analysis of 2D data sets; 

• automated peak picking; 

• the processing, incorporating both the above steps, to establish spectra free of noise and 

artefacts; 

• adaptive binning in 2D NMR spectra. 

The theoretical background and practical application of the NMR experiments used to acquire 

the 2D NMR spectra are summarised in Appendix A. Appendix B reviews the mathematics of the 

wavelet analysis techniques used in this project. Further appendices provide a short overview of 

genetic algorithms (used here to implement automated peak picking); describe the experimental 

methods used to acquire and process the NMR data sets; and outline the structure of the code 

created for this project. 

4


2. Statistical Analysis and Reduction of t1-Noise 

As discussed in the introduction, the 2D HSQC NMR experiment for the isotopes 1 H and 13 C 

is a powerful technique for profiling the metabolome. The phase-cycled version of HSQC is more 

sensitive than alternative of method of gradient-selection [22] and therefore better for the detection 

of metabolites at low concentration. However, phase-cycled HSQC suffers from artefacts known 

as t1-noise that can hinder the identification of peaks. 

This section describes the sources of t1-noise in phase-cycled HSQC, describes analysis of the 

structure of the noise, and—based on the results of the analysis—proposes an algorithm for noise 

reduction through processing of 2D NMR data sets. 

2.1. The Phase-Cycled HSQC Experiment and Sources of t1-Noise. An overview of twodimensional 

NMR is given in appendix A. The 2D HSQC experiment uses a specific pulse sequence 

that identifies the resonance frequencies of 1 H and 13 C nuclei where the atoms are connected by a 

single chemical bond. A series of FIDs (see section A.6) are acquired that measure the resonance 

frequency of 1 H nuclei. For each FID, a timing parameter, t1, in the pulse sequence is changed, 

and the nature of the sequence is such that the phase of each of the 1 H frequencies in the FID 

‘evolves’ with an angular frequency that is the resonance frequency of the 13 C nuclei to which 

is attached via a single bond. The 2D frequency spectrum derived from processing the FIDs is 

plotted with the 1 H frequency on the horizontal F2 axis, and 13 C on the vertical F1 axis. 

In naturally occurring carbon, 99% of the atoms are the isotope 12 C and only 1% are 13 C 

[14, 13]. Since 12 C is not a magnetic nucleus (see section A.1.1), it has no resonance frequency 

and therefore provides no additional information in a 2D HSQC experiment. Owing to its high 

abundance, the resonance signals detected from 1 H bonded to 12 C would overwhelm the desired 

signals from the 1 H– 13 C bonds and so are suppressed by the experimental procedure. In phasecycled 

HSQC, two FIDs are acquired at each t1 value with part of the pulse sequence modified for 

one of the FIDs. The change in the pulse sequence reverses the phase of the signal resulting from 

1 H– 12 C bonds, but leaves the 1 H– 13 C signal unchanged. Adding the two signals together leaves 

only the desired 1 H– 13 C resonance frequencies[21]. 

However, instrumental imperfections result in incomplete cancellation of the undesired 12 C 

signals. These imperfections change on each FID acquisition, and include[17]: 

• inconsistent rotation of the bulk magnetic moment caused by the radio frequency pulse, 

owing to the variation in field strength or pulse timing (see section A.2.2); 

• inconsistent phase of the radio frequency pulse; 

• inconsistent timing between the acquisition of successive FIDs. 

(These are distinguished from other instrumental imperfections that show variation both during 

the acquisition of a single FID as well as between successive acquisitions.) 

After the first Fourier transform in the F2 axis (see section A.7), the change in peak phase 

angle with t1 includes components resulting from the incomplete cancellation of unwanted signals. 

These components have a wide range of frequencies, and so after the second Fourier transform 

are seen as a ridge of signal intensity parallel to the F1 axis. The noise occurs at F2 resonance 

frequency of the 1 H nucleus in the 1 H– 12 C bond which will coincide with the F2 frequency of the 

1 H– 13 C bond, so the t1-noise ridge is associated with large peaks in the spectrum. 

An example of t1-noise in an HSQC spectrum is shown in Figure 1. The experimental method 

for this, and the other spectra used in this project, are described in appendix D. 

Note the spectrum contains other sources of noise, although in the HSQC spectra used in this 

project they are less intense than the t1-noise. One form occurs uniformly at all frequencies and 

is termed thermal noise. It is also instrumental in nature, and caused by background noise in 

the receiver coil[17]. In some cases, noise ridges parallel to the F2 axis can occur: this ‘t2-noise’ 

results from limitations of the signal processing hardware[17]. 

An alternative to phase-cycling is gradient-selection where the stable magnetic field, B0, varies 

steadily along the length of the sample. However, the gradient-selected HSQC experiment is √ 2 

less sensitive than the equivalent phase-cycled version, and the sensitivity difference is larger when 

5


Figure 1. Example of t1-noise in an HSQC spectrum of sucrose. (Only a small 

section of the F2 range is shown.) The t1-noise is seen as ‘ridges’ parallel to the 

F1 axis at the F2 frequencies of intense peaks. 

using the standard experimental techniques[22]. This motivates the desire to minimise the t1-noise 

in phased-cycled HSQC. 

2.2. Initial Analysis. The approach taken is to analyse the structure of the t1-noise in order 

to identify features that distinguish the noise from peaks (particularly small peaks) that may be 

convoluted with the noise. The analysis of the noise structure is described in this section. 

2.2.1. Data. For the initial analysis, the real–real spectrum—i.e. the real part of the Fourier 

transform in both dimensions corresponding to the absorption signal (see section A.6.5)—was 

downloaded as a text file from the Bruker Topspin software used to acquire and process the 

spectrum. A C ++ MEX function was used to load this data into matlab as a matrix. 

2.2.2. Visual Inspection. The structure of the t1-noise shown in Figure 1 is typical of the ridges 

seen across a number of spectra. Firstly, periodic behaviour is evident along the direction parallel 

to the F1 axis. Secondly, there are usually two distinct lines of peaks in each ridge either side of 

a central ‘trough’ where the t1-noise has less intensity. These lines are termed noise maxima lines 

in this project to avoid confusion with the terminology ‘peak’. 

Figure 2 shows an F1 trace—a 1D section through a 2D spectrum parallel to the F1 axis at a 

constant F2 value—along a maxima line in a t1-noise ridge of a HSQC glycine spectrum. (The 

glycine spectrum is used for this and the next figure since it shows a single intense peak and so 

the associated t1-noise ridge is not affected by equivalent ridges from any nearby peaks.) The plot 

of the trace intensity also suggests that the noise contains periodic components. 

Figure 3 shows the interquartile range (iqr) of F1 traces for a range of F2 frequencies across the 

same glycine t1-noise ridge. The iqr is used here as an estimate of the noise intensity (this concept 

is developed in section 2.3.2) and confirms the presence of the two maxima lines in the ridge. 

6


Intensity 

0.5 

−0.5 

−1 

x 106 

1 

0 

140 

120 

100 

80 

F 1 ( 13 C) / ppm 

Figure 2. F1 trace through an HSQC spectrum of glycine at F2 = 3.447 ppm 

showing the periodic components of the t1-noise. The peak associated with this 

ridge is at F1 = 41.30 ppm; it reaches an intensity of 6.6 ×10 6 and so is truncated 

in this plot. 

iqr(Intensity) 

2.5 

1.5 

0.5 

x 105 

3 

2 

1 

0 

3.55 

3.5 

3.45 

F 2 ( 1 H) / ppm 

Figure 3. Interquartile range of the intensity along F1 traces at F2 frequencies 

across the t1-noise ridge in an HSQC spectrum of glycine. 

2.2.3. Fourier Analysis. The periodic components indicated above suggest that Fourier analysis 

of the trace along t1-noise ridges may yield information about the structure of the noise. 

An NMR spectrum shows intensity as a function of frequency. Here, Fourier analysis is used 

to quantify periodic components as the NMR resonance frequency changes, so the trace is being 

considered as if it were a signal that varies with time. For this analysis, an arbitrary ‘time’ unit 

of one is assumed between each discrete frequency datum in the trace. 

Figure 4(a) shows the power spectrum for the section of the t1-noise trace shown in figure 2 

at higher frequencies than the peak, but not including the peak itself. This, and the other power 

spectra in this figure, were taken using a trace along the higher frequency noise maxima line of 

each t1-noise ridge. Part (b) of the figure shows the power spectra for similar trace sections in 

the sucrose spectrum shown in Figure 1 corresponding to the two most intense peaks and another 

peak at higher F2 frequency. The amplitude of the noise differs for each peak, so the energy is 

normalised, using the mean energy of each, in order to facilitate comparison. Part (c) shows the 

power spectra for two sections of the t1-noise trace associated with the most intense peak in the 

sucrose HSQC spectrum, one section at F1 frequencies higher than the peak, the other, lower. 

The power spectra show concentration of the energy at particular periods in the t1-noise, confirming 

the periodicity. The are a number of prominent periods, and there is a significant energy 

at a range of periods. Part (b) of the figure shows significant similarity in power spectra for the 

different t1-noise ridges in the same HSQC spectra, when considered over the same range of F1 

7 

60 

3.4 

40 

20 

3.35


(a) 

(b) 

(c) 

Energy 

Normalised Energy 

Energy 

1.5 

1 

0.5 

x 1014 

2 

0 

0 5 10 15 20 25 30 35 40 45 50 

30 

25 

20 

15 

10 

5 

F 2 = 5.302 ppm 

F 2 = 3.707 ppm 

F 2 = 3.5637 ppm 

Period 

0 

0 5 10 15 20 25 30 35 40 45 50 

x 1013 

4 

3 

2 

1 

F 1 = 157.0 − 105.4 ppm 

F 1 = 55.38 − 3.768 ppm 

Period 

0 

0 5 10 15 20 25 30 35 40 45 50 

Period 

Figure 4. (a) Power spectrum of a section of the t1-noise trace in a glycine 

HQSC spectrum (F2 = 3.447 ppm, F1 = 157.0 − 97.60 ppm). (b) Normalised 

power spectra of sections (F1 = 157.0 − 97.60 ppm) of t1-noise traces of a sucrose 

HSQC spectrum. (c) Power spectra for sections of the trace (F2 = 3.707 ppm) at 

frequencies higher (F1 = 157.0 − 97.60 ppm) and lower (F1 = 55.38 − 3.768 ppm) 

than associated intense peak in the same sucrose spectrum. 

frequencies. By comparing parts (a) and (b), taken over the same F1 range, it can be seen that 

the frequency components of the noise differ between spectra. 1 Part (c) of the figure shows that 

the frequency components differ between sections of the same t1-noise ridge. 

2.2.4. Continuous Wavelet Transform. The possible change in periodic behaviour with location 

motivates the use the Continuous Wavelet Transform (CWT), as described in section B.1. 

1 The two spectra were taken during the same experimental run with the same spectrometer and processing 

configuration. 

8


Scale (a) 

55 

52 

49 

46 

43 

40 

37 

34 

31 

28 

25 

22 

19 

16 

13 

10 

7 

4 

1 

100 200 300 

Position (b) 

400 500 600 

Figure 5. Pseudocolour plot of the Continuous Wavelet Transform (using the 

Mexican Hat wavelet) of a trace through an HSQC spectrum of glycine (F2 = 

3.447 ppm, F1 = 157.0 − 97.60 ppm). Lighter shades correspond to the largest 

absolute values of the transform. 

The CWT was performed using the ‘Mexican Hat’ wavelet (section B.1.6). This wavelet function 

was chosen for the CWT since it is symmetrical and has a shape (see Figure 39) that is similar 

to the peaks in the noise, both of which make the interpretation of the wavelet coefficients more 

straightforward. Figure 5 shows a pseudocolour plot of the CWT of a t1-noise trace. 

By comparison with the CWT of a simple periodic signal given in Figure 40, it can be seen 

that the CWT of the t1-noise trace provides evidence of periodic components in the signal. The 

presence of maxima (the lightest shades) at a range of scale values suggests that signal contains 

a number of periodic components at different frequencies. Although there is significant periodic 

behaviour with position, such as regularly alternating bands of light and dark, the nature of this 

behaviour does vary across the CWT plot, confirming that the frequency components are localised. 

2.2.5. Conclusion. Both Fourier analysis and the Continuous Wavelet Transform show that the 

t1-noise signal has a relatively large number of periodic components and that the nature of the components 

changes with location along the the signal. Both the number of components and localised 

behaviour would make it difficult to accurately isolate the noise from small peaks convoluted with 

the t1-noise ridge. 

However, the similar nature of the power spectra shown in Figure 4(b) indicates sections of the 

t1-noise ridges covering the same F1 frequency ranges do have very similar structure, even when 

the ridges are far from one another in the F2 dimension. This suggests analysis of the correlation 

between t1-noise traces would be useful. 

2.3. Noise Separation. In the initial investigations using Fourier and wavelet analysis, sections 

of the trace were used that did not contain ‘genuine’ spectral peaks as they would have added spurious 

components to the signal. Measurement of correlation would be similarly affected, especially 

if a large peak were present in one trace and not the other. 

However, the 2D spectra of metabolic samples potentially have genuine peaks at many locations, 

and so any technique to reduce the noise must handle the presence of peaks, rather than restrict 

its operation to sections of the spectrum free of peaks. For this reason, and to allow more accurate 

correlation calculations based on the entire length of the trace rather than small peak-free sections, 

the next step of the analysis is to separate the noise from the genuine peaks. 

Note that the purpose is more accurately stated as the separation of components of the same 

amplitude as the noise from the significantly larger genuine peaks. Noise separation is unlikely to 

be able to distinguish small genuine peaks convoluted with the noise from the noise itself and so 

these small peaks will be appear as part of the separate noise signal. 

2.3.1. Noise Distribution. The noise separation techniques use a threshold to distinguish between 

noise and peak signal components. An accurate determination of the appropriate threshold depends 

upon an understanding of the distribution of the noise values. 

9


Count 

12000 

10000 

8000 

6000 

4000 

2000 

0 

−4 −3 −2 −1 0 1 2 3 4 

Normalised Intensity 

0.999 

0.997 

0.99 

0.98 

0.95 

0.90 

0.75 

0.50 

0.25 

0.10 

0.05 

0.02 

0.01 

0.003 

0.001 

(a) (b) 

Probability 

−4 −3 −2 −1 0 

Data 

1 2 3 4 

Figure 6. (a) is a histogram of normalised data values in t1-noise ridges. 

(b) plots the experimental values against an normal probability distribution. The 

data values were obtained from an HSQC spectrum of sucrose in peak-free ranges 

F1 = 157.0 − 97.60 ppm and F1 = 55.38 − 3.768 ppm. t1-noise ridges were 

identified by F1 traces with an interquartile range of less than 5000. 

To estimate this distribution, a relatively large number of data points were assessed by considering 

areas of t1-noise ridges free from peaks. However, as indicated by Figure 3, the amplitude of 

the noise varies across a t1-noise ridge, so values are normalised by dividing by the interquartile 

range of the F1 trace. (As discussed below, this is a relatively robust estimator for the noise 

amplitude.) 

Figure 6 shows the distribution of the normalised t1-noise data points from a HSQC spectrum 

of sucrose. t1-noise ridge sections were identified by taking peak-free subsets of F1, and F2 values 

where the trace interquartile range was significantly above that of the thermal noise. The histogram 

shows the shape of the normal distribution. The normal probability plot of the data is also 

indicative of a normal distribution: the plot is very linear, especially in the central region, although 

the curvature at larger negative data values suggests that the left tail of the data distribution is 

shorter than would be expected. Similar results were obtained for the noise distribution in other 

HSQC spectra. 

2.3.2. Estimators for Noise Distribution Parameters. Figure 6 suggests that mean of the noise is 

approximately zero: a calculation using the same data gives −0.0061. This might be expected 

for spectra that are accurately baselined (so that signal-free areas of the spectrum approach zero 

intensity), and if, as it appears, the nature of the noise causes the intensity to vary above and 

below the actual value of the spectrum. The analysis below makes this assumption throughout. 

The interquartile range (iqr), is used as an estimator for standard deviation of the normal 

distribution. This estimator is used as it is more than robust than a direct evaluation of the 

standard deviation. If the separated noise spectra included some data points from genuine peaks 

in addition to the t1-noise itself, most of the large magnitude data points will be outside the 

25–75% quartile range measured by the iqr. Although the peak-related data points may skew the 

quartile distribution slightly, they will have significantly less impact on the value of the iqr than 

on the standard deviation calculation that considers the values of all data points, especially if the 

peaks are large. 

Given this robustness, and the relatively small proportion of each signal that contains peaks in 

the 2D HSQC spectra used in this project, the iqr estimate is applied to the entire trace, including 

the genuine peaks, to give an estimate for the noise. 

The standard deviation σ can be calculated from the iqr using the relationship: 

σ(·) = QN iqr(·) (2.1) 

10


The value of the constant QN is derived as follows. 

Assuming the mean as zero as above, and using the definition of interquartile range and symmetry 

of the distribution, gives: 

q 

2 

− q 

2 

fN(x)dx = 1 

2 

where q is the interquartile range, and fN is the probability density function of the normal distribution 

of mean µ and standard deviation σ, i.e.: 

Combining (2.2) and (2.3) gives: 

(by substituting u = x/( √ 2σ)) 

fN(x) = 1 

σ √ 2π e−(x−µ)2 /(2σ 2 ) 

1 1 

= 

2 

σ √ 2π 

= 1 

√ π 

q 

2 

− q 

2 

q 

2 √ 2σ 

− q 

2 √ 2σ 

= 2 

q 

2 

√ 

π 

√ 2σ 

0 

where erf(x) is the error function, defined as, 

e −x2 /(2σ 2 ) dx 

e −u2 

du 

e −u2 

du 

 

q 

= erf 

2 √ 

2σ 

erf(x) = 2 

√ π 

Denoting the inverse of the error function as erf −1 , then (2.4) gives, 

q 

σ = 

2 √ 2erf −1 ( 1 

2 ) 

and thus, 

 

QN = 2 √ 2 erf −1 

−1 1 

2 

x 

0 

(2.2) 

(2.3) 

(2.4) 

e −u2 

du (2.5) 

The value of erf −1 (1/2) can be estimated numerically to give QN ≈ 0.7413. 

2.3.3. Wavelet Noise Separation. The requirement to separate small amplitude signals from larger 

components in consistent with the properties of wavelet denoising described in section B.7.1. 

The normal distribution of the noise suggests that the denoising threshold could be derived 

from the standard deviation of the noise signal as the universal threshold (see section B.7.2). 

However, this method derives a threshold that often overestimates the maximum detail coefficient 

expected from the noise in a signal of given length[1]. If used for separating the t1-noise, it is likely, 

therefore, to include values from genuine peaks and therefore adversely affect the calculations of 

correlations between t1-noise traces. 

Instead, a threshold is derived based on the standard deviation of the coefficients at a given 

wavelet decomposition level. The iqr of the detail coefficients is used to estimate the standard 

deviation using equation (2.1). 2 A multiple of the standard deviation is then used as the threshold 

to give confidence intervals related to the normal distribution. For example, a threshold of 3 times 

the standard deviation would be expected to include 99.73% of the coefficients resulting from the 

noise. 

2 Note that the assumption is made here that the detail coefficients exhibit a normal distribution when the 

underlying signal has the same distribution. 

11 

(2.6) 

(2.7)


If dm,n are the detail coefficients at decomposition level m, then the threshold for this level is 

given by: 

Λm = Kσ ({dm,n}) 

= KQN iqr({dm,n}) (2.8) 

where K is a constant multiplier of the standard deviation to be chosen (e.g. 3 for the 99.73% 

interval described above), σ(·) the standard deviation function, and iqr(·) the interquartile range 

function. 

In this context, hard thresholding (see section B.7.2) has the disadvantage that any detail 

coefficients just outside the ‘confidence interval’ for the noise (defined in terms of the standard 

deviation) will be unaffected, and so any associated noise signal will be left in the ‘peak’ spectrum 

(i.e. the spectrum after denoising) rather than contribute to the t1-noise spectrum. 

Soft thresholding overcomes this problem, but has the disadvantage that all coefficients are 

modified by an amount equal in absolute value to the threshold. This has the effect of decreasing 

the height—and significantly affecting the overall volume—of the peak. The peak volume is a key 

datum for comparing spectra and, so, with a view to an algorithm for minimising peak noise, a 

modified form of thresholding is created. It has less affect on peak volume than soft thresholding, 

but retains the ability to appropriately include noise just larger than the threshold in the t1-noise 

spectrum. 

The new method is named here as ‘gradual’ thresholding since it gradually changes from soft 

thresholding when the coefficients are close to the threshold to hard thresholding for coefficients 

larger in absolute value. The definition is: 

d S m,n = 

0 if |dm,n| < Λm 

dm,n 

|dm,n| 

 

|dm,n| − Λm2 

|dm,n| 

 

otherwise 

where Λm is the threshold value. By comparison with equation B.69 in appendix B, it can be 

seen that this is the soft thresholding method but with the amount by a coefficient is changed now 

modified by a factor of Λm/|dm,n|. 

Although the spectrum is 2D, the denoising is performed separately on each 1D trace in the F1 

direction. Applying wavelet denoising in 2D would be inappropriate since the thresholding would 

assume constant amplitude noise over regions of the spectrum with a non-zero F2 width, but, from 

above, the t1-noise amplitude varies with F2. 

Figure 7 shows the spectra resulting from wavelet denoising using the three thresholding methods. 

The denoising was applied separately to each F1 trace in turn. The threshold, at each wavelet 

decomposition level, was 3 times the standard deviation in all cases; the wavelet decomposition 

was to 9 levels and used the Coiflet wavelet of order 2. (The Coiflet wavelet was chosen for 

since it approximately symmetrical, and—unlike the Mexican Hat wavelet used earlier—is both 

orthogonal and compactly supported, enabling its use in the pyramid algorithm.) 

The original (noisy) spectrum is that shown in Figure 1. The peak spectrum (a) resulting from 

hard threshold shows some remaining large amplitude noise. Soft thresholding (b) more accurately 

removes most of this noise, but does reduce the height of the peaks (compare the intense peak 

at about F2 = 3.55 ppm in each spectrum). Gradual thresholding (c) still removes most of the 

noise, but has less effect on peak height. The t1-noise spectrum in (d) shows a small amount of 

the peak signal ‘leaking’ into this spectrum. In general, choosing the thresholding value—in this 

case by changing the multiplier K in (2.8)—is a compromise between capturing most of the noise 

and avoiding leakage of the peak signal. 

Other thresholding methods were also investigated. One promising method leveraged the fact 

that noise has both positive and negative intensities, but that in a properly phase-corrected spectrum, 

genuine peaks have only positive intensities. The largest absolute value of negative data 

values therefore is related only to the noise and can be used to derive the threshold. However, this 

method is not considered in detail here since it is not extendable to the complex spectra used in 

later analysis (see section 2.5). 

12 

(2.9)


(a) (b) 

(c) (d) 

Figure 7. Example of t1-noise separation using wavelet denoising in an HSQC 

spectrum of sucrose. (Only a small section of the F2 range is shown.) (a),(b) 

and (c) are the peak spectra resulting from noise separation using hard, soft and 

gradual thresholding methods respectively. (d) shows the corresponding noise 

spectrum resulting from the gradual thresholding method, using a different scale 

for the intensity. 

One disadvantage of wavelet denoising was found to be the introduction of ‘troughs’ close to 

peaks, and the introduction of small artefacts near the base of peaks. These observations can can 

be characterised as ‘pseudo-Gibbs’ phenomena, similar to the effects seen in Fourier transforms 

of rapidly changing signals, and poor approximation to the original signal using the modified 

coefficients owing to the wavelet shape[20]. 

The artefacts can be reduced using a non-decimating wavelet decomposition as described in B.8. 

The disadvantage is the processing time: if the decomposition is to m levels, this requires 2 m 

separate implementations of the pyramid algorithm (with a shift in the signal by one unit for 

each) for each trace, compared to just one for the standard wavelet decomposition. A compromise 

is to perform only a proportion of the 2 m shifts[20]. 

Figure 8 shows examples of these modified techniques in the same HSQC spectrum as previously, 

but plots the results for an F1 trace. The wavelet function and threshold are the same as before. 

The troughs and artefacts (particularly the large negative ‘peaks’) can be seen in (b). Although 

the troughs are largely removed in (c), the smaller scale artefacts remain. A translation invariant 

wavelet decomposition in (d) does little to reduce them. 

2.3.4. Noise Separation by Direct Signal Thresholding. An alternative method of separation is 

to simply threshold the spectrum directly without performing a wavelet decomposition. The 

threshold value is calculated in the same way as for the wavelet noise separation—as multiple of 

the standard deviation—but derived from the iqr of the spectrum itself rather than of the detail 

coefficients. The same thresholding methods are still applicable, and the advantages of using the 

gradual thresholding method still hold. 

Figure 9 shows the results of this method on the same region of a HSQC spectrum of sucrose that 

was used for wavelet denoising in Figure 7. Equivalent parameters were used: gradual thresholding 

using a threshold value of 3 times the noise standard deviation. The peak spectrum shows few of 

13


(a) 

(b) 

(c) 

(d) 

Intensity 

Intensity 

Intensity 

Intensity 

2 

1 

0 

−1 

−1 

x 106 

3 

x 106 

3 

2 

1 

0 

2 

1 

0 

−1 

−1 

x 106 

3 

x 106 

3 

2 

1 

0 

140 

140 

140 

140 

120 

120 

120 

120 

100 

100 

100 

100 

80 

F 1 ( 13 C) / ppm 

80 

F 1 ( 13 C) / ppm 

80 

F 1 ( 13 C) / ppm 

80 

F 1 ( 13 C) / ppm 

Figure 8. Example of t1-noise separation using wavelet denoising in an HSQC 

spectrum of sucrose. The F1 trace at F2 = 3.707 ppm is plotted. (a) is the 

original trace. (b) is the peak signal after wavelet denoising using 9 levels of 

wavelet decomposition. (c) uses 5 levels of wavelet decomposition. (d) uses a 

translation invariant decomposition to 5 levels. 

14 

60 

60 

60 

60 

40 

40 

40 

40 

20 

20 

20 

20


(a) 

(b) 

Figure 9. Example of t1-noise separation using direct thresholding of the signal 

in an HSQC spectrum of sucrose. (Only a small section of the F2 range is shown.) 

(a) is the peak spectrum; (b) is the corresponding t1-noise spectrum. 

the artefacts seen with wavelet denoising, and the noise spectrum shows little ‘leakage’ from the 

peaks. 

2.3.5. Conclusion. The noise separation by direct thresholding appears to perform best in this 

context, in particularly avoiding the introduction of artefacts into the spectrum. The form of the 

underlying signal—large flat sections with occasional positive-only peaks—appears to be unsuitable 

for wavelet denoising. Examples given in the literature [1, 23] apply wavelet denoising to 

more periodic underlying signals, for which the direct thresholding method would be unsuitable. 

Therefore the direct thresholding method is used to separate the noise in subsequent analysis. 

2.4. Correlation of t1-Noise Traces. Having separated the t1-noise, the correlation between 

F1 traces in the noise ridges can be analysed, without the large peak values affecting the results. 

If the 2D spectrum is denoted by the function Φ(f1, f2) where the F1 and F2 chemical shifts 

take discrete values f1 and f2 respectively, then an F1 trace for F2 = f2 may be denoted by 

Φf2(f1). 

The correlation for two traces is calculated using the formula: 

 

 

ρ(f2, f ′ 2 ) = 

 

f1 

 

Φf2(f1) − Φf2 

f1 

2 

Φf2(f1) − Φf2 

 

Φf ′ 2 (f1) − Φf ′ 2 

f1 

Φf ′ 2 (f1) − Φf ′ 2 

2 

(2.10) 

where Φf2 denotes the mean of Φf2(f1) over the discrete values f1. 

Figure 10 shows the correlation of F1 traces for the section of the t1-noise spectrum separated 

from a HSQC spectrum of sucrose. The noise spectrum is that shown in Figure 9(b). 

It can be seen that the higher-frequency noise maxima line Ah is correlated well with traces at 

slightly higher frequency (to the left on the horizontal axes); similarly Al is correlated well with 

15


(a) 

(b) 

(c) 

F 2 ( 1 H) 

Correlation 

Correlation 

Bl 

Bh 

Al 

Ax 

Ah 

1 

0.5 

0 

−0.5 

−1 

1 

0.5 

0 

−0.5 

−1 

Ah Ax Al Bh Bl 

F 2 ( 1 H) 


F 2 ( 1 H) 


Figure 10. (a) is a pseudocolour plot of correlation of F1 traces covering the 

range F2 = 3.850 − 3.505 ppm in the t1-noise of a HSQC spectrum of sucrose. 

Strong positive correlations (close to +1) are the lightest shades; strong negative 

correlations (close to −1) are darkest; the mid shade of grey indicates no correlation. 

The labels ‘Ah’ and ‘Al’ mark the location of the noise maxima lines for 

one t1-noise ridge; ‘Bh’ and ‘Bl’ similarly for a second ridge. ‘Ax’ is the centre 

of the first t1-noise ridge. (b) plots the correlation relative to the trace Ah. (c) 

plots the correlation relative to the trace Ax. 

16 

F 2 ( 1 H)


traces at slightly lower frequencies. Ah is strongly inversely correlated with Al (ρ is close to −1). 

The same pattern holds for noise maxima lines of the second t1-noise ridge, B. 

The correlation with nearby F1 traces suggests a method of distinguishing small peaks convoluted 

with the noise. By subtracting a well-correlated trace (and adjusting for different relative 

amplitudes) from the trace under consideration, peaks present in the current trace, but not in the 

correlated traces, might become more prominent than the remaining noise. 

An important observation is that Ah is also strongly correlated with Bh, and Al with Bl. 

This would significantly benefit the algorithm outlined above. Peaks are likely to extend across 

nearby traces in the same t1-noise ridge, and so would largely cancel out when two nearby traces 

are subtracted. However, peaks are less likely to be present at the same location in traces from 

different t1-noise ridges, such as Ah and Bh. 

However, it is noticeable that traces at the centre of a t1-noise ridge are strongly correlated 

with very few other traces, either in the same or other t1-noise ridges. For example, Figure 10(c) 

shows this to be case for the trace Ax; equivalently the pseudocolour plot in (a) shows largely mid 

grey shades, indicating ρ close to 0, for Ax. 

2.5. Complex Correlation of t1-Noise Traces. The relatively smooth transition of correlation, 

for example considering changes from Ah through Ax to Al for the correlation with the trace Ah, 

indicates that the noise is in some way changing its ‘phase’ across F2, from being in phase around 

Ah (compared to Ah itself) to a phase difference of π near Al. By comparison with the method of 

peak phase correction (section A.6.5), this suggests that additional information may be contained 

in the ‘imaginary’ frequency values resulting from the imaginary coefficients produced by the first 

(along F2) Fourier transform (the spectra analysed so far having consisted of only the real frequency 

values). The additional information may assist in correlating the noise at the centre of t1-noise 

ridges, such as Ax described above, which are neither in-phase nor completely out-of-phase. 

Using the additional imaginary frequency values, each data point along an F1 trace now becomes 

a complex value. (There are also additional imaginary coefficients resulting from the second Fourier 

transform, but they are not necessary for the analysis here, assuming that phase correction of the 

2D spectrum in the F1 direction takes the same form at all F2 values.) 

Before investigating the correlation using both the real and imaginary parts of the spectrum, 

it is necessary to change the way in which the data is uploaded for analysis, to re-analyse the 

distribution of the noise, and to modify the noise separation method. 

2.5.1. Data. To access the imaginary components, the post-processing Bruker Topspin data files 

were accessed directly, rather than using intermediate text files. The data files used were the 

(F2 real; F1 real) and (F2 imaginary; F1 real) data files: the first is identical to the real spectra 

analysed above; the second is the ‘imaginary’ spectrum. 3 The files were read using a matlab 

MEX file written in C ++ to produce two matrices - one for each spectrum component. The two 

matrices were combined to produce a single matrix with complex values. 

2.5.2. Noise Distribution. The distribution of the complex values in the noise was investigated 

with a view to deriving suitable thresholding parameters for noise separation. The procedure 

described in section 2.3.1 was repeated to produce a relatively large number of data points for 

analysis, this time using the full complex spectrum rather than only the real spectrum. The 

distribution of the modulus of the complex values, i.e. |Φ(f1, f2)| where Φ now represents the 

complex spectrum intensity, is analysed. 

Denoting the real and imaginary parts of the spectrum as Φ (ℜ) and Φ (ℑ) , then, 

|Φ| = 

 

Φ (ℜ)2 + Φ (ℑ)2 

(2.11) 

If the imaginary spectrum were to have the same normal distribution as the real spectrum was 

found to have in section 2.3.1, and if the noise in each is independent of the other, then owing to 

the relationship in (2.11), the modulus would have a Rayleigh distribution (or, equivalently, a χ 

3 As described above, the other data files for (F2 real; F1 imaginary) and (F2 imaginary; F1 imaginary) are not 

required for this analysis. 

17


Density 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

Normalised Modulus 

Rayleigh Fit 

Weibull Fit 

0 

0 0.5 1 1.5 2 2.5 

Data 

3 3.5 4 4.5 

Figure 11. Histogram of normalised modulus of the complex intensity in t1noise 

ridges, with fitted Rayleigh and Weibull distributions (using the matlab 

distribution fitting tool). The data values were obtained from an HSQC spectrum 

of sucrose in peak-free ranges F1 = 157.0−97.60 ppm and F1 = 55.38−3.768 ppm. 

t1-noise ridges were identified by F1 traces with a median magnitude of less than 

5000/ √ 2. 

distribution with two degrees of freedom). This distribution has a probability density function of 

the form: 

fR(x) = x 

s2 e−x2 /2s 2 

(2.12) 

where s is a scaling factor. 

Figure 11 shows the distribution of the normalised modulus of the complex spectrum. The 

modulus was normalised using the median modulus value. The fit of a Rayleigh distribution (calculated 

using the matlab distribution fitting tool) is good, but the fit to a Weibull distribution—of 

which the Rayleigh distribution is a special case—is better[15]. Although not investigated here, a 

possible reason why the fit to the Rayleigh distribution is not as good as expected is that the noise 

distributions in the real and imaginary part of the spectrum are not independent of one another. 

For simplicity, a Rayleigh distribution is assumed for the following. The slight loss of accuracy is 

not significant, since the use made of distribution is to provide only an estimate of the thresholding 

parameter for denoising. 

2.5.3. Estimators for Noise Distribution Parameters. The interquartile range is no longer suitable 

as a robust estimator for noise distribution parameter, and instead the median of the modulus is 

used. 4 As before, the contribution from a small number of large peaks will have relatively little 

effect on the median modulus. 

A constant multiplier, K, of the median modulus for use as a threshold in noise separation— 

equivalent to the same constant in (2.8)—is derived as follows. 

4 If the median of the absolute value were used for the purely real spectrum, the value would be half that of the 

interquartile range. This factor, combined with the √ 2 increase in the modulus compared to the real part suggested 

by equation (2.11), is the reason for the adjustment of the upper limit of the median magnitude in Figure 11 to 

5000/ √ 2: the equivalent limit used to identify streaks using iqr in Figure 6 was 5000. 

18


If the median modulus is denoted by η, then, 

η 

fR(x)dx = 1 

2 

by definition of the median. Substituting from (2.12), 

1 

2 = 

η 

x 

s2 e−x2 /2s 2 

dx 

0 

0 

 

−e −x2 /2s 2η = 

0 

= 1 − e −η2 /2s 2 

0 

(2.13) 

(2.14) 

Thus, 

η2 = ln 2 (2.15) 

2s2 If the proportion of data values within the upper limit Kη is πK, then from (2.12), 

Kη 

x 

πK = 

s2 e−x2 /2s 2 

dx 

and so, 

= 

 

−e −x2 /2s 2 Kη 

0 

= 1 − e −K2 η 2 /2s 2 

(2.16) 

− K2η2 2s2 = ln (1 − πK) (2.17) 

Substituting from (2.15) gives, 

K 2 ln (1 − πK) 

= − (2.18) 

ln 2 

hence, 

 

ln(1 − πK) 

K = − (2.19) 

ln 2 

For the real spectrum with a normal noise distribution, a threshold of 3 times the standard 

deviation—or equivalently 2.224 times the iqr using (2.1)—was used to select approximately 

99.73% of the noise values. For an equivalent proportion assuming a Rayleigh distribution of 

the noise in the complex spectrum, equation (2.19) gives K ≈ 2.921. 

2.5.4. Noise Separation for the Complex Spectrum. The separation of t1-noise in the complex 

spectrum is achieved using the equivalent of the gradual thresholding of the direct signal that 

was used for the real spectrum. The quantity that is thresholded is the modulus of the complex 

spectrum, leaving the argument (phase) unchanged. If a trace in the complex spectrum is written 

in polar form: 

Φf2(f1) = r(f1)e iθ(f1) 

(2.20) 

where r(f1) is the modulus and θ(f1) the argument, then the thresholding is applied to the values 

r(f1). 

Figure 12 shows the results of the denoising of the complex spectrum on the same section of the 

HSQC spectrum as used previously. This can be compared with denoising of the real spectrum 

shown in Figure 9. Note that since the plots for the complex spectrum show the modulus of the 

complex intensity values, there are no negative values and the shape of the peaks differ. 

Wavelet denoising—thresholding of the detail coefficients—was also investigated for the complex 

spectrum. Rather than using complex wavelets, which do not have compact support[16] and are 

therefore unsuitable for the pyramid algorithm decomposition, the real and imaginary parts of a 

trace were decomposed separately using real wavelets. Complex detail coefficients were derived by 

combining the corresponding detail coefficients at the same level and position, and the modulus 

of the detail coefficient thresholded as described above. The thresholded values were then split 

19


(a) 

(b) 

Figure 12. Example of t1-noise separation using direct thresholding of the modulus 

of the complex signal in an HSQC spectrum of sucrose. (Only a small section 

of the F2 range is shown.) (a) is the modulus of the complex intensity in the peak 

spectrum; (b) is the corresponding plot for the t1-noise spectrum. 

into real and imaginary parts to enable wavelet recomposition of the real and imaginary peak and 

noise spectra using the inverse pyramid algorithm. 

As with the real spectrum, wavelet denoising tended to introduce artefacts. The direct thresholding 

of the spectrum was therefore used in preference. 

2.5.5. Complex Correlation. The complex correlation between F1 traces in the complex t1-noise 

spectrum is calculated using an extension of the formula for the (real-valued) correlation (equation 

(2.10)): 

 

 

 

 

Φf2(f1) − Φf2 Φ f1 

∗ f ′ (f1) − Φ 

2 

∗ f ′ 2 

2 

Φf2(f1) − Φf2 

 

(2.21) 

ρ(f2, f ′ 2 ) = 

 

Φf f1 

f1 

′ 2 (f1) − Φf ′ 2 

2 

where ∗ indicates the complex conjugate. (The same symbol, ρ, was used for the standard correlation, 

but will denote the complex correlation in the following.) When considered in polar form, the 

complex correlation can be interpreted as follows: the argument indicates the ‘phase difference’ 

between the two traces, and the modulus indicates the degree of similarity between the traces 

when the two traces are brought into phase. Note that the complex correlation is not symmetrical 

in terms of f2 and f ′ 2: ρ(f2, f ′ 2) = ρ(f ′ 2, f2) ∗ . 

Figure 13 is the equivalent of Figure 10 but using the complex correlation. The modulus of 

the complex correlation is plotted, ranging from 0 to 1, rather than the actual value used in the 

correlation of the real spectrum which had the range −1 to +1. (a) and (b) show the degree 

of correlation, irrespective of phase. In this case, it can be seen that the complex trace for the 

‘trough’ Ax at the centre of the ridge is now well-correlated with other traces in t1-noise ridge 

A, and also with traces in ridge B. Compare this with Figure 10 using only the real spectrum, 

20


where the same trace was poorly correlated. (c) shows that the phase of the noise in the complex 

spectrum, relative to Ax, changes smoothly across a t1-noise ridge: this can be seen for the both 

across Ah to Al and Bh to Bl. Other spectra showed a similar pattern of correlation and phase 

change. 

In peak-free F2 ranges either side of a ridge, the argument of the complex correlation tends 

to stay close to a constant value, and the correlation between traces in these regions is strong. 

For example, the left-hand (higher frequency) end of Figure 13(c) shows the start of the constant 

phase behaviour, and the region of light shades in the the lower left-hand corner of (a) show that 

the traces are correlated well with one another. The constant arguments (phases) on the higher 

and lower frequency sides differ by approximately π, i.e. the noise of the higher frequency side 

is exactly out-of-phase with that on the lower frequency side. One observation from this is that 

t1-noise must extend some distance either side of the main ridge (although at low amplitudes), 

to enable this continued correlation: random noise would show no correlation and random phase 

with Ax. This is useful for the denoising algorithm since the t1-noise region extends significantly 

further than the width of a genuine peak. Of course, the presence of other t1-noise ridges modifies 

this behaviour, and the fluctuations in the argument (phase) of the correlation to the left of Ah, 

before reaching a region of more constant values, is indicative of weaker t1-noise ridges in this 

frequency range. 

An alternative representation of the change in the complex correlation, particularly the phase, 

is shown in Figure 14. The figure considers the complex correlation compared to traces in a 

relatively isolated t1-noise ridge to the higher F2 frequency end of the HSQC spectrum of sucrose; 

the t1-noise ridges used in the examples above are convoluted with weaker ridges nearby which 

produce a more complex pattern of phase changes than this isolated ridge. The figure shows the 

correlation with all 2048 traces across the entire F2 range of the noise spectrum, plotted on an 

Argand diagram. As can be seen in (a), relative to a trace at a frequency slightly higher than 

the noise maxima line (a position on the left ‘flank’ of the ridge), the majority of other traces 

tend to be correlated with a phase difference of 0 or of π, corresponding to be either in phase or 

exactly out-of-phase. (c) shows that the ‘trough’ in the centre of the t1-noise ridge is at a phase 

difference of ±π/2 with the majority of other traces, while (b) shows that the higher frequency 

noise maxima line shows a behaviour in between these two. 

2.5.6. Discussion - Possible Use in Phase Correction. A phase correction to the NMR spectrum 

of the type described in A.6.5, made across the F2 direction has no effect on the complex corre- 

, have the same phase correction applied. However, if the 

lation if the two traces, Φf2 and Φf ′ 2 

phase correction is say, a first- or second-order change in terms of F2, rather than a constant, by 

consideration of equations (A.24) and (2.21), it can be seen that the complex correlation modulus 

will be unchanged, but the argument will change by an amount equal to the difference in phase 

corrections applied to Φf2 and Φf ′ 2 . 

This suggests a method of using the argument of the t1-noise complex correlation to calculate 

the phase correction required to an uncorrected spectrum. From the observation above, in regions 

free of peaks, the argument of the complex correlation (measured against a chosen, fixed, trace) 

is expected to be a constant value, but an unphased spectrum may show the argument phase 

changing, say, linearly with F2 instead. In this case, a first-order phase correction could be derived 

based on this linear behaviour: essentially the first-order phase correction that would make the 

t1-noise correlation argument constant in peak-free regions. 

A short investigation of this was performed, but found that the correlation argument contains 

significant fluctuations even in the peak-free regions making accurate determination of the phase 

correction difficult. A practical method would require robust isolation of the linear (or secondorder 

etc.) change in the correlation argument from the noisy fluctuations. 

2.5.7. Conclusion. The complex correlation calculated on the full complex spectrum is a better 

choice for using similarity between F1 traces to identify noise. Owing to the ‘phase change’ in 

the t1-noise across a ridge, the central ‘trough’ is poorly correlated with other traces in the real 

spectrum, but in the complex spectrum it is well correlated with other traces in the same ridge 

and in other ridges. 

21


(a) 

(b) 

(c) 

F 2 ( 1 H) 

Modulus of 

Complex Correlation 

Argument of 


Bl 

Bh 

Al 

Ax 

Ah 

1 

0.8 

0.6 

0.4 

0.2 

0 

3.142 

0 

−3.142 


F 2 ( 1 H) 


F 2 ( 1 H) 


Figure 13. (a) is a pseudocolour plot of modulus of the complex correlation of 

F1 traces covering the range F2 = 3.850 − 3.505 ppm in the t1-noise of a HSQC 

spectrum of sucrose. Strong correlations (close to 1) are the lightest shades; weak 

correlations (close to 0) are darkest. The labels ‘Ah’ and ‘Al’ mark the location of 

the noise maxima lines for one t1-noise ridge; ‘Bh’ and ‘Bl’ similarly for a second 

ridge. ‘Ax’ is the centre of the first t1-noise ridge. (b) and (c) plot the modulus 

and argument, respectively, of the complex correlation relative to the trace Ax. 

22 

F 2 ( 1 H)


(a) 

(b) 

(c) 

Imaginary Part of 






1 

0.5 

0 

−0.5 

−1 

−1 −0.5 0 0.5 1 

1 

0.5 

0 

−0.5 

Real Part of Complex Correlation 

−1 

−1 −0.5 0 0.5 1 

1 

0.5 

0 

−0.5 


−1 

−1 −0.5 0 0.5 1 


Figure 14. Plots of the noise complex correlation represented on an Argand 

diagram relative to trace of a t1-noise ridge in an HSQC spectrum of sucrose. (a) 

shows the complex correlation relative to a trace at a slightly higher frequency 

(F2 = 5.315 ppm) than an ‘h’ noise maxima line (i.e. on the left ‘flank’ of the 

t1-noise ridge). (b) and (c) shows the complex correlation relative to a higher 

frequency (‘h’) noise maxima line (F2 = 5.302 ppm) and the ‘trough’ in the 

centre of a t1-noise ridge (F2 = 5.289 ppm), respectively. 

23


2.6. Denoising Algorithm. The denoising algorithm makes use of the correlation between traces 

in the complex spectrum to distinguish between t1-noise and ‘genuine’ peaks. It makes the assumption 

that a genuine peak will be present in a small number of nearby F1 traces, but that the 

noise signal is very similar in traces separated by distances larger than the peak width. 5 

2.6.1. Algorithm Description. 

(1) The complex spectrum is separated into the peak and noise components, denoted Φ (p) (f1, f2), 

and Φ (n) (f1, f2), by gradual thresholding of the spectrum as described above. 

(2) For each discrete value of f2, the complex correlation of the noise trace Φ (n) 

with all other 

f2 

traces Φ (n) 

f ′ (f 

2 

′ 2 = f2) is calculated using (2.21). The complex correlation is considered in 

polar form, i.e. 

ρ(f2, f ′ 2) = rf2(f ′ 2)e iθf (f 2 ′ 

2 ) 

(2.22) 

(3) For the f2 trace under consideration, a set, M, of f ′ 2 values is chosen according to the 

criteria: 

Highly Correlated: The set contains only values for which the correlation is above a 

threshold, that is: r(f ′ 2 ) ≥ R (M) 

Best Correlated: The set contains (at most) the N (M) best correlated values, i.e for 

which the values r(f ′ 2 ) are the largest. 

Distant: f ′ 2 is outside the range [f2 − F (M), f2 + F (M)] where F (M) > 0 is a chosen 

constant. 

Phase Balanced: The set contains an equal number of values for which |θf2(f ′ 2 )| < π/2 

and |θf2(f ′ 2)| ≥ π/2. 

The purpose of these criteria is discussed below. 

(4) Using the set M, an unnormalised masking trace is derived as: 

Φ (m′ ) 

f2 (f1) = 

f ′ 2 ∈M 

 

The denominator, median 

w [rf2(f ′ 2 )] 

 

Φ (n) 

f ′ (f1) 

2 

median(|Φ (n) 

f ′ 2 

|) e−iθf 2 (f ′ 

2 ) 

(2.23) 

|Φ (n) 

f ′ | , robustly normalises each trace, as discussed above, 

2 

before contribution to the mask. The factor e−iθf (f 2 ′ 2 ) adjusts the ‘phase’ of each f ′ 2 trace 

to that of f2. 

w(·) is a weighting function that scales the contribution of a trace to the mask depending 

on the correlation. The weighting function was chosen to be the modulus of the correlation 

itself, so the expression simplifies to: 

Φ (m′ ) 

f2 (f1) = 

f ′ 2 ∈M 

Φ (n) 

f ′ (f1) 

2 

median(|Φ (n) 

f ′ |) 

2 

ρ(f2, f ′ 2 )∗ 

(2.24) 

where ∗ indicates the complex conjugate. 

(5) The mask is then adjusted so that its median modulus (a measure of its signal amplitude) 

is the same as that of the trace Φ (n) 

f2 : 

Φ (m) 

f2 (f1) = median(|Φ(n) f2 |) 

median(|Φ (m′ ) 

|) f2 Φ(m′ ) 

(f1) (2.25) 

f2 

(6) The adjusted masks over all f2 values together form a masking spectrum Φ (m) (f1, f2). 

The denoised complex spectrum is constructed by subtracting the mask from the noise 

spectrum, and adding it back to the peak spectrum: 

Φ (d) = Φ (p) + Φ (n) − Φ (m) 

5 A quantitative assessment of peak width is discussed in section 3.2. 

24 

(2.26)


the superscript (d) indicating the denoised spectrum. The denoised real spectrum (corresponding 

to the absorption signal, see A.6) that is used for the remaining analysis is 

simply the real part of Φ (d) . 

Note that the algorithm does not test whether the trace at a particular F2 frequency is part of 

a t1-noise ridge. This was based on the observation that the t1-noise continues either side of the 

visible ridge for some distance, albeit at a lower amplitude, and continues to show good correlation 

with other traces. Therefore the algorithm attempts to denoise across the entire F2 range. 

2.6.2. Masking Criteria. 

Highly Correlated: This criterion is straightforward. By picking only highly correlated 

traces, the intention is that the mask is very similar to the trace in question and so when 

the mask is subtracted from the noise, any remaining noise signal has low amplitude. 

Best Correlated: This criterion is intended to improve the denoising of traces with many 

well-correlated traces. By picking only the best correlated, the remaining signal when the 

mask is subtracted has the lowest amplitude. Rather than using the single best correlated 

trace, the mask is built from a number of traces so that if any of the traces should have a 

small ‘genuine’ peak in it, the effect of this peak is minimised. 

Distant: The ‘Distant’ criterion avoids the removal of small ‘genuine’ peaks: the objective 

of the algorithm is to minimise the noise but retain any small peaks convoluted with it. A 

trace is typically best correlated with it immediate neighbours. Each of these neighbours 

would contain the peak signal, and so without the distant criterion, the mask would also 

include the peak signal. When subtracted from the noise spectrum, the mask would remove 

the peak (or significantly reduce its intensity). The ‘Distant’ criterion therefore avoids the 

use of near neighbours in deriving the mask. 

Phase Balanced: The ‘Phase Balanced’ criterion was introduced when it was found that 

small ‘genuine’ peaks in the spectra of metabolomic samples were occasionally removed 

by the algorithm in particular circumstances. If the trace in question has a peak at a 

particular F1 frequency, and a number of other ridges have a peak at the same frequency, 

then many of the traces contributing to the mask might contain a peak signal at this 

frequency. (The other ridges are too far away from the trace in question for the ‘Distant’ 

criterion to operate.) When the mask derived from these traces is subtracted from the 

noise, the peak is reduced or removed. 

By reference to Figure 14(a), it can be seen that for many traces, their well-correlated 

traces are either in phase or out-of-phase by a factor of π. The effect of phase-balancing is 

to pick an equal number of traces that are in-phase and (almost) exactly out-of-phase to 

form the mask. Since each trace is phase-adjusted by this phase difference when forming 

the mask (see equation (2.23)), this means that approximately half the traces are multiplied 

by a phase adjustment factor close to −1. If many of the traces contributing to the 

mask contain peaks, the multiplication of about half the traces by −1 can result in the 

cancellation of peak signal when the mask is formed, resulting in less reduction to genuine 

peaks in the denoised spectrum. This effect is only partial, being dependent on the exact 

choice of the traces to form the mask and the similarity of peak signals in these traces. 

In addition, the criterion works less well for traces where the correlation phase pattern 

is more complex. Nevertheless, tests show that this criterion does limit the reduction of 

peak intensity in these particular circumstances. 

Figure 15 illustrates phase-balancing using a hypothetical small ‘genuine’ peak convoluted 

with the ridge A. When the mask for trace Bh is constructed, Ah and Al are 

a possible pair of phase-balanced contributors. The contribution to the real part of the 

mask from the peak in (b) is shown in (c) where the signal has been phase adjusted by 

the phase of the complex correlation relative to trace Bh. The contributions from Ah and 

Al cancel one another, so the peak signal does not appear in the real part of the mask. 

Since it is the real part of the spectrum that will be used for analysis after denoising, this 

means that the t1-noise at Bh will be removed, correctly, by the noise signals at Ah and Al 

25


(a) 

(b) 

(c) 

Argument of 


Peak Intensity 

(Real Component) 


(Real Component) 

3.142 

0 

−3.142 

0 

0 

Ah Ax Bh 

F 2 ( 1 H) 

Ah Ax Bh 

F 2 ( 1 H) 

Ah Ax Bh 

Figure 15. An illustration of the ‘Phase Balanced’ criterion. (a) shows the phase 

angle of the complex correlation of the t1-noise relative to Bh. (b) is a small 

hypothetical peak convoluted with the t1-noise ridge A. (c) is the real component 

of the contribution to the mask from the peak in (b) after adjustment by the 

phase of the noise relative to Bh. If Ah and Al are a phase-balanced pair chosen 

for the mask, the contribution from each to the real spectrum mask cancel one 

another. 

F 2 ( 1 H) 

(which are ‘amplified’ by the phase adjustment by bringing the phase of the noise signals 

into alignment) and not by the peak signal (which is cancelled out by the adjustment). 

2.6.3. Choice of Denoising Parameters. All the criteria, and the parameters used to control them, 

are a balance between removing as much t1-noise as possible while retaining the intensity of small 

‘genuine’ peaks embedded in the noise. For example, a small ‘distant’ parameter, F (M), results 

in better noise reduction but can lead to a reduction in peak intensity. Similarly, the ‘Phase 

Balanced’ criterion prevents the reduction of small peaks in some circumstances, but for traces 

with few well-correlated traces, which tends to be the case for the ‘troughs’ at the centre of t1-noise 

ridges, the balancing requirement limits the number of traces included in the mask, often resulting 

in poorer noise reduction. 

The appropriate parameter values were chosen by experimentation. Note that some are dependent 

on the settings of the NMR experiment itself: for example, the ‘distant’ parameter, F (M), is 

dependent on the width of peaks in the spectrum. The derivation of these parameters is discussed 

further in section 4. Table 1 shows the typical parameter settings for the spectra considered in 

this project. 

2.7. Results and Discussion. To assess the efficacy of the denoising algorithm, a series of 

samples were created from solutions of sucrose and glycine. The HSQC spectrum of glycine 

26


Parameter Value 

minimum correlation modulus, R (M) 

0.5 

maximum number of traces, N (M) 3 × data points in F2 peak width 

minimum distance from trace, F (M) 1 × F2 peak width 

phase-balanced yes 

Table 1. Typical parameters used for deriving the mask spectrum in the denoising 

algorithm. 

Intensity 

−1 

−2 

−3 

x 104 

3 

2 

1 

0 

140 

120 

100 

80 

F 1 ( 13 C) / ppm 

Figure 16. F1 trace in the noise component of the HSQC spectrum of a mixture 

of sucrose (250 mM) and glycine (4 mM), at the centre F2 frequency of the 

glycine peak. Some of the signal of the small glycine peak at (F1 = 41.30 ppm, 

F2 = 3.434 ppm) can be seen in the trace. 

contains a single intense peak at (F1 = 41.30 ppm, F2 = 3.434 ppm) which coincides with a 

t1-noise streak in the sucrose spectrum. 

The samples were created using a constant concentration of sucrose at 250 mM but varying 

the glycine concentration to change peak size. The sample considered initially used a glycine 

concentration of 4 mM, resulting in a peak size a little above the amplitude of the t1-noise. Owing 

to the small size of the peak, when the noise spectrum is separated, a significant portion of peak is 

included in the noise spectrum as can be seen in Figure 16. The maximum intensity of the peak in 

the noise spectrum is 2.301 × 10 4 compared to 4.142 × 10 4 in the full spectrum prior to denoising. 

The denoising algorithm was performed on this spectrum using the parameters specified in 

Table 1. Figure 17 shows the spectra for the sample before and after denoising. The reduction in 

the t1-noise can be clearly seen. 

Figure 18(a) shows the interquartile range before and after denoising across a range of the 

spectrum that includes all the prominent peaks. It can be seen that iqr is reduced at most F2 

values across t1-noise ridges, indicating a reduction in noise. This behaviour is confirmed by 

Figure 18(b) which shows the factor by which the iqr is reduced after denoising. Significant 

features seen here, and also when denoising of other HSQC spectra, are that the algorithm is 

particularly good at removing noise on the ‘flanks’ of t1-noise ridges, even some distance from the 

main ridge, but is often less good near the centre of the ridge, and particularly the ‘trough’ at the 

centre. 

The RMS Signal-to-Noise Ratio (SNR) defined in (2.28) is used as a quantitative measure of 

sensitivity, and thereby the noise, in the following analysis. Since the t1-noise changes with F2, 

an assumption will be made that the noise of the F1 trace through the centre of peak itself is 

representative. So if the peak is at (f1, f2), 

SNR = Φ(f1, f2) 

σ {Φf2(f1)} 

27 

60 

40 

20 

(2.27)


(a) 

(b) 

Figure 17. The HSQC spectrum of a mixture of sucrose (250 mM) and glycine (4 

mM) (a) before and (b) after denoising. The F2 range encompasses the majority 

of prominent peaks. 

where Φf2(f1) is the F1 trace through the peak. If a normal distribution of the noise in the 

real spectrum continues to be assumed, the relationship between the standard deviation and the 

interquartile range in (2.1) gives, 

Φ(f1, f2) 

SNR = 

QN iqr {Φf2(f1)} 

(2.28) 

where QN ≈ 0.7413 from (2.7). 

Figure 19 shows the change in noise around the glycine peak as a result of denoising. In this 

case, the denoising algorithm has not reduced the intensity of the peak. (In fact, its maximum 

intensity has increased slightly from 4.142 to 4.344 × 10 4 .) 

28


(a) 

(b) 


Reduction in iqr 

x 104 

12 

10 

8 

6 

4 

2 

0 

5.5 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

5.5 

Original Spectrum 

Denoised Spectrum 

5 

5 

4.5 

4.5 

4 

F 2 ( 1 H) / ppm 

4 

F 2 ( 1 H) / ppm 

Figure 18. (a) plots the interquartile range (a measure of noise) in the HSQC 

spectrum of a mixture of sucrose (250 mM) and glycine (4 mM) before and after 

denoising. The F2 range encompasses all of the prominent peaks. (b) plots the 

factor by which the interquartile range is reduced after denoising. 

The SNR values for the glycine peak, calculated using the formula (2.28) are 5.084 before 

denoising and 14.34 after, an improvement by a factor of 2.82. (Most of the improvement is due 

to the decrease in the t1-noise rather than the slight increase in the peak intensity.) 

2.7.1. Alternative Algorithm. An alternative algorithm considered was to apply the masking algorithm 

separately to each level of wavelet decomposition of the noise trace. 

Using the equation (B.64), the pyramid algorithm can be used to deconstruct each noise trace 

as: 

Φ (n) 

f2 (f1) = s (n) 

f2;M,0 φM,0(f1) + 

M 

m=1 

2 M−m −1 

n=0 

3.5 

3.5 

3 

3 

d (n) 

f2;m,n ψm,n(f1) (2.29) 

where φm,n and ψm,n are the dilated and translated scaling and wavelet functions, and s (n) 

f2;m,n and 

d (n) 

f2;m,n 

are the approximation and detail coefficients. Note that the signal Φ(n) 

f2 is complex and 

so the coefficients take complex values, formed by using the pyramid algorithm to decompose the 

real and imaginary parts of the noise spectrum separately, and then combining the corresponding 

coefficients to give complex coefficients. The scaling and wavelet functions remain real-valued. 

29


(a) 

(b) 

Intensity 

Intensity 

5 

4 

3 

2 

1 

0 

−1 

−2 

−3 

5 

4 

3 

2 

1 

0 

−1 

−2 

−3 

x 10 4 

30 

x 10 4 

30 

35 

35 

40 

F 1 ( 13 C) / ppm 

40 

F 1 ( 13 C) / ppm 

45 

45 

50 

50 

3.4 

3.42 

3.44 

3.46 

3.48 

F 2 ( 1 H) / ppm 

3.4 

3.42 

3.44 

3.46 

3.48 

F 2 ( 1 H) / ppm 

Figure 19. A section of the HSQC spectrum of a mixture of sucrose (250 mM) 

and glycine (4 mM) around the glycine peak. (a) shows the section before denoising 

and, (b), after. 

After decomposing all the F1 noise traces in this way, a mask is found for each f2 value in 

turn as in the standard algorithm. However a mask is derived for each wavelet decomposition 

separately. At each level, m ′ the set of detail coefficients d (n) 

f2;m ′ ,n is treated as a complex signal 

(in terms of n), and its complex correlation with the equivalent set of coefficients for all other f2 

values is calculated. 

30 

3.38 

3.38



x 104 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

0 

5.5 

Standard Denoising Algorithm 

Wavelet−Level Denoising Algorithm 

5 

4.5 

4 

F 2 ( 1 H) / ppm 

Figure 20. The interquartile range (a measure of noise) in the HSQC spectrum 

of a mixture of sucrose (250 mM) and glycine (4 mM) after using the standard and 

wavelet-level denoising algorithms. The F2 range encompasses all of the prominent 

peaks. 

A masking set of detail coefficient, d (n) 

f2;m ′ ,n , is then derived using steps 3 to 5 in the standard 

algorithm. This is repeated for each level m ′ . From these masking detail coefficients at all levels, 

a masking trace is reconstructed using the equivalent of equation (2.29), and these traces together 

form the masking spectrum. 

The rationale for this approach is that the correlation between traces may change when considering 

signal components at different scales (or equivalently at different frequencies), and therefore 

deriving a separate mask at wavelet decomposition level might result in more accurate masking of 

the t1-noise. 

This alternative derivation of the masking spectrum produces a reduction in t1-noise that is 

equivalent, but occasionally slightly worse, than standard algorithm described above. Figure 20 

compares the interquartile range in the denoised spectrum for the two algorithms. The waveletlevel 

algorithm used the Coiflet wavelet of order 2, to 10 levels of decomposition. Although the 

wavelet-level algorithm is better at reducing noise at low amplitudes, the standard algorithm 

is slightly better at reducing the large amplitude noise towards the centre of some of the t1noise 

ridges. As a—not necessarily representative—indication, the SNR for the glycine peak 

after the wavelet-level algorithm is slightly lower at 12.57 compared to 14.34 for the standard 

algorithm (again, the majority of this improvement resulting from the reduction noise rather 

than a slight increase in peak intensity). When compared on other spectra, it was found that 

the standard algorithm was more robust, with the wavelet-level algorithm occasionally producing 

results significantly worse than the standard algorithm at a few individual F2 values. 

Given its equivalent (or slightly better performance) in noise reduction, and its shorter running 

time compared to the the wavelet-level denoising algorithm, the standard algorithm is used for 

denoising in the subsequent sections of this project. 

2.8. Comparison to Other t1-Noise Reduction Techniques. 

Reference Deconvolution: Another method of t1-noise reduction is to use reference deconvolution 

as described in [13]. The technique also leverages the correlation between 

t1-noise ridges, but works in the time- rather than frequency-domain. The strong t1-noise 

ridge associated with the compound used to produce the reference frequency in the sample 

is identified, and then a (complex spectrum) trace through the ridge is converted back into 

31 

3.5 

3


the time-domain by performing an inverse Fourier transform in the F1 direction. Comparison 

of this experimental time-domain signal with a predicted theoretical form enables the 

identification of a complex correcting function (the equivalent of the masking spectrum) 

which can then be applied to the entire spectrum. By using a series of traces through 

the t1-noise ridge of the reference signal across a small range of F2 values, the technique 

can also account for changes in the t1-noise in the t2 direction, corresponding to changes 

during acquisition of the FID. 

The reference deconvolution technique is similar to the denoising algorithm described 

earlier, essentially using the same correlation properties but applying a masking spectrum 

(or correcting function) in the time rather than frequency domain. However, it does require 

both a strong reference signal to be identified, and for the theoretical form of that signal 

to be calculated (in particular, requiring that the reference signal is not convoluted with 

other signals). The denoising algorithm of section 2.6 has a desirable property that, by 

picking the other traces that each trace is correlated with individually, it can handle the 

gradual change in t1-noise with F2 that is observed in the spectra; it is unclear whether 

reference deconvolution is adaptable in this manner. 

Cadzow Procedure: A further technique is to use the Cadzow procedure to directly denoise 

the FIDs as described by [4]. The technique makes use of properties of a Toeplitz 

matrix derived from the FIDs to remove all signals apart from those resulting from N 

resonance frequencies. However, the technique requires a priori knowledge of the number 

of resonance frequencies (i.e. the number of peaks) in the signal. It therefore appears unsuitable 

for metabolomic profiling where the number and location of peaks in the spectrum 

is not known in advance. 

32


3. Automated Peak Picking Using a Genetic Algorithm 

Peak Picking is the process of identifying the position of peaks in an NMR spectra, in particular 

distinguishing peaks from noise artefacts. Although NMR processing software provides tools to 

assist in peak picking, accurate picking often requires the experience of the experimenter. This 

manual process can be time-consuming, especially for the spectra of metabolomic samples, and 

can be subjective. 

This section describes the use a genetic algorithm (GA) (appendix C) to automate peak picking. 

The aim is to incorporate some of the knowledge used by the experimenter into the algorithm, 

and to leverage the analysis of t1-noise in section 2 to assist in distinguishing small peaks from 

noise. 

3.1. Peak Shape. The GA attempts to fit experimental peaks to theoretical peak shapes. 

In theory, the resonance frequency of a nucleus is very well-defined. In practice, the transverse 

relaxation process (section A.3.2) leads to line broadening[14] owing to the exponential decay in 

the FID and its effect on the subsequent Fourier transform. The theoretical shape is a Lorentzian 

peak of the form[9]: 

w0.5 2 Φ(fc) 

Φ(f) = 

w0.5 2 + 4(f − fc) 2 

where fc the centre frequency of the peak, Φ(fc) is the intensity at the centre frequency, and w0.5 

is the peak width at half-height. In practice, a number of experimental and instrumental factors 

lead to further significant broadening of the peak [9, 14]. 

The shape of the peak is also affected by the choice of the window function applied to the 

FID prior to the Fourier Transform (section A.6.3). The window function can be used to improve 

the sensitivity of the experiment, but also affects the peak shape. Peak broadening decreases the 

resolution—the ability of the experiment to distinguish nearby peaks—and the choice of window 

function is often a compromise between improved SNR and decreased resolution[9]. 

An analysis of peaks in the 2D HSQC spectra used in this project shows that, as a result of the 

pre-Fourier Transform processing, the peak shape is very close to a Gaussian. Figure 21 shows the 

examples of fitting Lorentzian and Gaussian shapes to experiment glycine peaks from samples of 

two different concentration. As can be seen, the fit is good for the Gaussian in both the F1 and 

F2 directions at both concentrations. 

Although a Gaussian peak shape is consistent with the HSQC spectra considered here, it is 

not an assumption of the peak picking genetic algorithm: instead, the theoretical peak shape is a 

parameter to the algorithm. 

3.2. Peak Width. From Figure 21, it can also be seen that the peak width is independent of the 

peak intensity. 

To verify this, the glycine peak width was measured in a series of HSQC spectra of mixtures 

of sucrose and glycine where the glycine concentration, and therefore the peak intensity, varied 

from lowest to highest by a factor of approximately 40. For each dimension, the two peak radii 

at quarter peak-height were measured. Since the spectra have discrete values, the quarter-height 

radius was estimated by linear interpolation. For example, if the peak maximum is Φ(fc), located 

at the discrete value fc, and the peak intensity falls below quarter of the height at the maximum 

between fr1 and fr2, where fr2 > fr1 > fc, then the estimate of the radius at quarter height, r0.25 

is given by: 

r0.25 = fr1 − fc + Φ(fr1) − 0.25Φ(fc) 

(fr2 − fr1) (3.2) 

Φ(fr1) − Φ(fr2) 

The radius at quarter-height was chosen in preference to the more traditional radius (or width) 

at half-height in order to provide more accuracy when calculating from discrete values: the radius 

at quarter-height is larger than the radius at half-height and so is subject to less error from linear 

interpolation and other calculations. For most peaks, it is still measured at a sufficiently high 

intensity to avoid the effect of noise. 

Figure 22 shows peak radii at quarter height plotted against peak intensity. In general, the 

slight variation in peak radius on the higher frequency side is the mirror image of that on the lower 

33 

(3.1)


Intensity 

Intensity 

x 105 

20 

15 

10 

5 

0 

−5 

43.5 

x 105 

20 

15 

10 

5 

0 

−5 

43.5 

43 

43 

42.5 

42.5 

42 

42 

41.5 

41 

F 1 ( 13 C) / ppm 

40.5 

Experimental Peak 

Fitted Lorentzian Peak 

Fitted Gaussian Peak 

40 

39.5 

39 

Intensity 

x 106 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

3.48 

3.46 

3.44 

3.42 

F 2 ( 1 H) / ppm 

(a) (b) 

41.5 

41 

F 1 ( 13 C) / ppm 

40.5 




40 

39.5 

39 

x 106 

2.5 

2 

1.5 

1 

0.5 

0 

−0.5 

3.48 

3.46 

3.44 

3.42 

F 2 ( 1 H) / ppm 

(c) (d) 

Intensity 

3.4 

3.4 




3.38 




Figure 21. Experimental peak shapes fitted to theoretical Lorentzian and 

Gaussian peaks. The peak is the glycine peak in HSQC spectra of mixtures 

of sucrose (250 mM) and glycine. (a) and (b) show fitting in the F1 and F2 

directions respectively for a strong glycine peak from a sample concentration of 

81 mM; (c) and (d) show the equivalent for a less intense peak from a sample of 

concentration 12 mM. 

frequency side, and so the peak width (the sum of the two radii) remains constant for the same 

peak as the intensity changes. (The mirror-image pattern of radius variation can be explained by 

the ‘true’ central frequency of the peak moving slightly in each spectrum compared to the discrete 

grid of frequency values: the central frequency from which the radius is measured is the discrete 

value nearest to the ‘true’ value.) 

3.3. Peak Fit Metric. To assess the fit of a theoretical peak to the experimental peak, the 

following metric was chosen: 

 

(f1,f2)∈R 

ΩR = 

(Φ(f1, f2) − Θ(f1, f2)) 2 

 

(f1,f2)∈R Φ(f1, f2) 2 

(3.3) 

where Φ(f1, f2) and Θ(f1, f2) are the experimental and theoretical spectra respectively. R is the 

region of interest, and in practice this can cover a set of adjacent peaks (see section 3.6). Note 

that the metric is independent of the intensity scale and therefore gives equivalent measurements 

for large and small peaks. 

3.4. A Priori Knowledge Encapsulated in the Genetic Algorithm. Experimenters performing 

manual peak picking use a priori knowledge about peak shape to distinguish between 

peaks and noise artefacts. The GA attempts to capture the following aspects of this knowledge in 

its genome and operators. 

Radial Symmetry: Symmetry of peak shape is a criterion used by many automated peak 

picking techniques in multidimensional NMR[12]. The shape of Lorentzian (equation (3.1)) 

is symmetrical about the centre frequency. This is also suggested by Figure 22 when the 

fluctuations in the discretized centre frequency value are accounted for. However, this 

does indicate that the GA must also determine a more accurate centre frequency if radial 

symmetry is to be accounted for. 

34 

3.38


(a) 

(b) 

F 1 Radius ( 13 C) / ppm 

F 2 Radius ( 1 H) / ppm 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

Higher Frequency Radius 

Lower Frequency Radius 

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

x 10 6 

0 


0.02 

0.018 

0.016 

0.014 

0.012 

0.01 

0.008 

0.006 

0.004 

0.002 

Higher Frequency Radius 

Lower Frequency Radius 

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 

x 10 6 

0 


Figure 22. Peak radii at quarter height for the glycine peak in HSQC spectra 

of mixtures of sucrose (250 mM) and glycine, the glycine concentration ranging 

from 4 mM to 160 mM. The peak radii for the higher and lower frequency sides of 

the peak are plotted against peak intensity. (a) shows the radii in F1 dimension; 

(b) in F2 dimension. 

Peak Width: As discussed above, experimental and instrumental factors broaden the resonance 

frequency line to form a peak. Thus peaks may be distinguished by a characteristic 

width. However, other processes, such as chemical exchange[14], can widen the peak further. 

Thus the GA can distinguish peaks from artefacts using the criterion of a peak width 

at or above a particular threshold, the value of which will be determined by the nature of 

the NMR experiment and the processing of the FID. 

This criterion is significant when used in conjunction with the denoising algorithm 

defined in section 2.6. As can be seen in Figure 18, the denoising algorithm is particularly 

effective at reducing the amplitude of the noise at the sides (‘flanks’) of t1-noise ridges. 

This means that the peak width of noise artefacts in the relatively strong amplitude centre 

of the ridge will have their (quarter-height) peak widths significantly reduced, increasing 

the likelihood of rejection by the GA. 

Multiplets: As described in section A.5, spin-Ssin coupling gives rise to multiplets of peaks 

with intensities in specific ratio. If the distance between the peaks of the multiplet are 

less than the resolution of the NMR experiment and instrument, the individual peaks may 

not be distinguishable and instead a single broad peak, symmetrical, but different from 

the standard shape, may result. An example of such a peak shape is shown in Figure 23. 

35


Intensity 

F 1 ( 13 C) / ppm 

12 

10 

8 

6 

4 

2 

0 

−2 

71 

x 10 5 

72 

73 

74 

3.68 

3.66 

3.64 

F 2 ( 1 H) / ppm 

Figure 23. Example a peak shape caused by a non-resolved multiplet in an 

HSQC spectrum of sucrose. 

The homonuclear spin-spin coupling constant for 1 H in the organic compounds present 

in a metabolome is dependent on the relative orientations of the two nuclei and the number 

of bonds (2 or more) separating them, but can range in magnitude from


(a) 

(b) 

2.5 

1.5 

0.5 

0 

3 

Offset from Centre F 1 ( 13 C) 

Frequency of Peak B / ppm 

Fit Metric 

0.5 

0.45 

0.4 

0.35 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

1 

Fit Metric 

2 

2 

1 

2 

3 

1 

Multiplet Degree 

0 

4 

−1 

−2 

5 

−0.15 

0.025 

−0.1 

0.02 

0.015 

−0.05 

0 

0.05 

Offset from Centre F 2 ( 1 H) 

Frequency of Peak A / ppm 

0.01 

0.005 

0 

0.1 

Multiplet Seperation along F 2 ( 1 H) / ppm 

Figure 24. Surface plots of the fitness of theoretical peak shapes against experimental 

peaks. A lower fit metric represents a better fit. In each case two of the 

theoretical peak parameters were varied, keeping all others constant. In (a), the 

experimental peak region contains two adjacent (and partially convoluted) peaks 

and the parameters varied are the centre F1 frequency of one peak and the centre 

F2 frequency of the other. In (b), the experimental region is a single peak consistent 

with a non-resolvable multiplet and the parameters varied are the multiplet 

degree and multiplet separation (equivalent to the spin-spin coupling constant). 

frequency of the other. As can be seen, the fitness landscape has a distinct global minimum which 

should be quickly located by a GA. 

Figure 24(b) shows the fit to the non-resolvable multiplet shown in Figure 23. The parameters 

are the separation of the multiplet peaks and the degree of the multiplet. Here the fitness landscape 

is more complicated, and while there is a global minimum it is more shallow and there are 

indications of rapidly changing fitness at larger multiplet separations. (Since other parameters, 

particularly accurate peak centre frequencies and widths are not varied in this calculation, and 

may not be the optimum values, the multiplet degree and separation should not be inferred from 

this figure.) 

Note that the behaviour at higher multiplet separation (largely omitted from Figure 24 for 

clarity of the global minimum) suggests that the a relatively good fit for an odd degree of multiplet 

37


(triplet and quintiplet etc.) changes to a bad fit for an even degree (doublet, quadruplet etc.) and 

vice versa. This might be expected since odd degree multiplets are symmetrical about the central 

peak in the multiplet, while even degree multiplets are symmetrical about a point midway between 

the two central peaks, resulting in distinctive shapes for each case, especially when the multiplet 

separation is large compared to the resolution in the spectrum. 

The two example fitness landscapes, while being necessarily limited projections of the many 

dimensional search space, suggest that there are few local optima and so, with carefully designed 

operators to take account of behaviours such as that discussed for the multiplet degree, the GA 

will be a suitable technique to locate the global optimum. 

3.6. Identification of Convoluted Peak Regions. The GA is applied not to the entire spectrum 

in one go, but one at a time to ‘regions’ of the spectra that consist of adjacent, and therefore 

potentially convoluted, peaks. Each of these regions is ‘isolated’ from the rest of the spectrum 

and so can theoretical peak fitting can take place independently on each region. 

The experimental peaks are initially determined from the spectrum by finding local maxima in 

the denoised spectrum. (In practice, additional criteria are applied to improve performance - this 

is discussed in section 4.) 

Next, the watershed region of each peak is identified. The watershed region is essentially the 

contiguous area around the peak in which hill-climbing would end up at the peak maximum and 

so defines the unique ‘area’ of the peak. The watershed region is bounded by ‘valleys’ after which 

the surface rises to meet another peak. For convoluted peaks, this boundary may actually be at 

a significant height (intensity) in the sector of the boundary common to the two peaks. 

Technically, the region located is the watershed of the negative spectrum, i.e. of −Φ(f1, f2), but 

the term ‘watershed’ is used here for brevity. In the context of the negative spectrum, the term 

watershed is a more accurate geophysical analogy: the watershed is the region of the spectrum in 

which ‘rainfall’ would collect in the peak (now a ‘depression’ in the negative spectrum). 

The implementation of the GA uses matlab’s default watershed algorithm that is a variation 

of the Vincent and Soille algorithm[16]. When applied to the entire (negative) spectrum, the 

algorithm locates the discrete frequency grid points belonging to the watershed of each peak. 

The boundary ‘valleys’ are common to more than one peak and so the grid points forming the 

boundaries do not belong to any watershed. 

For practical reasons, discussed in sections 3.9 and 4, the watershed area identified is modified 

to consist of those points that are a certain percentage, θw, of the peak height or above. In other 

words, the watershed does not extend all the way to the valley floor. This is implemented by 

removing those points below the height threshold from the ‘full’ watershed, and then filling in any 

‘holes’ in the resulting pattern of grid points. 

To identify convoluted peak regions, the watershed of the spectrum is processed so that all grid 

points that are in a watershed, or within a distance of 1 grid unit (using the ‘cityblock’ distance 

metric) of a watershed, are identified. The distance criterion includes the common watershed 

boundaries within the region. The resulting set of grid points is then used to identify contiguous 

regions (using matlab’s bwlabel function). Each region consists of peaks with common boundaries, 

and that are therefore potentially convoluted, and each region is isolated in the sense that 

it shares no boundary with other regions. 

3.7. Genetic Algorithm Representation, Operators and Objective Function. This section 

describes, in functional terms, the specifics of the genetic algorithm used for fitting a single 

convoluted peak region. The implementation details are described in section 3.8 below. 

3.7.1. Genome. In the GA, the genome represents a number, Np, of peaks in a convoluted peak 

regions. For each peak p in this convoluted peak region, the genome stores the following variables: 

• The location of peak centre in terms of F1 and F2 frequencies, fc;1 and fc;2 respectively. 

Note that the location of the peak is a real value, and is not restricted to the discrete 

frequency grid of the spectrum. 

• The peak height (maximum intensity), h. 

38


• The peak widths at quarter peak height in the F1 and F2 dimension, w0.25;1 and w0.25;2. 

The widths also apply to the individual peaks in a (non-resolvable) multiplet. Again, the 

widths are not restricted to the discrete frequency grid spacing. 

• The degree of multiplet in both dimensions, m1 and m2, and the corresponding separations 

j1 and j2. (Although, as discussed above, multiplets will not be permitted in the 13 C 

dimension, F1 for spectra considered here.) 

3.7.2. Initialisation Operator. The genomes of the initial population in the GA are initialised 

using estimated values derived from the experimental spectrum. The peak centre frequencies, fc;1 

and fc;2, are the coordinates on the discrete frequency grid of the peak maximum. For each peak 

in the region, the height, h, is taken from the intensity at the peak centre. The widths, w0.25;1 

and w0.25;2, are initialised to the interpolated values described in section 3.2 above. (As seen in 

Figure 22, this may include a small error resulting from the discrete nature of the peak centre 

location.) The F2 multiplet degree, m2, is initially set to 1, indicating no multiplet. 

Immediately after setting the values from the experimental spectrum, the genome is mutated 

using the mutation operator below so that there is variance in the initial GA population. 

3.7.3. Mutation Operator. The mutation operator makes random changes to the genome of a single 

individual. 

For each peak, individual genome variables are using mutated using a zero-meaned Normal distribution 

with the standard deviation controlled by the variable itself, or another genome variable, 

in conjunction with a multiplier. 

fc;1 = fc;1 + ∆f1 where ∆f1 ∼ N(0, kfw0.25;1) (3.4) 

fc;2 = fc;2 + ∆f2 where ∆f2 ∼ N(0, kfw0.25;2) (3.5) 

h = h + ∆h where ∆h ∼ N(0, khh) (3.6) 

w0.25;1 = w0.25;1 + ∆w1 where ∆w1 ∼ N(0, kww0.25;1) (3.7) 

w0.25;2 = w0.25;2 + ∆w2 where ∆w2 ∼ N(0, kww0.25;2) (3.8) 

j2 = j2 + ∆j2 where ∆j2 ∼ N(0, kjj2) (3.9) 

The multipliers kf,kh,kw, and kj, are parameters of the GA as a whole. 

The multiplet degree variable, m2, is modified according the equation: 

⎧ 

⎪⎨ U(0, (m2 − 1)) m = 1 

m2 = m2 + round(∆m2) where ∆m2 ∼ −(2 + N(0, 0.5) m > 1 and χ ≤ 0.5 (3.10) 

⎪⎩ 

(2 + N(0, 0.5) m > 1 and χ > 0.5 

where U(a, b) denotes the uniform distribution between a and b, m2 is the maximum multiplet 

degree allowed, χ is a random variable uniformly distributed between 0 and 1, and the round(·) 

operator returns the nearest integer. The effect of this mutation is to randomly set the multiplet 

degree if it is currently 1 (indicating no multiplet), otherwise to encourage changes in the multiplet 

degree of ±2 to move sensibly in the fitness landscape for multiplets as described in section 3.5. 

(Note that if m2 is currently 1 and is mutated to a higher value, then the separation j2 is set to 

an initial value determined by another parameter of the GA). 

All the genome variables have a range of allowable values, set by parameters of the GA, from 

a priori knowledge where appropriate. For the example, the range of the multiplet separation, 

j2 is defined by the range of the homonuclear 1H coupling constant presented in section 3.4. If 

mutation results in a value outside of the range, the usual correction is to set the variable to the 

closest range limit. 

A further mutation operates on the genome as a whole to create a new peak. A random peak is 

chosen and split into two (the height of each being half that of the original). The centre frequencies 

of the two new peaks are symmetrical arranged around the centre of the old peak, the separations 

in each dimension being chosen according a Normally-distributed random variable. 

39


Each of the possible mutations occur independently with a chosen probability. This probability 

is parameterised according to the type of mutation: for example the mutation to produce a new 

peak is generally set low to avoid fitting noise artefacts using a large number of small peaks. 

3.7.4. Crossover Operator. The crossover operator takes two parents, A and B, and produces one 

or two children, C and D, by combining the genomes of A and B. 

Since the number of peaks in A and B may be different—owing to the mutation that can split 

peaks—the crossover operation is more complicated than normal crossover where the genome size 

is constant. The solution is to choose pairs of peaks, one from A and the other from B, that are 

closest and perform crossover on each pair individually, as follows. 

For each peak, a in A in turn, the closest peak b is located. The resulting variables for peak c 

in the child are derived as: 

f (c) 1 

c;1 = 2 (f(a) c;1 

f (c) 1 

c;2 = 2 (f(a) c;2 

+ f(b) c;1 ) (3.11) 

+ f(b) c;2 ) (3.12) 

h (c) = 1 

2 (h(a) + h (b) ) (3.13) 

w (c) 1 

0.25;1 = 2 (w(a) 0.25;1 + w(b) 0.25;1 ) (3.14) 

w (c) 

0.25;2 

= 1 

m (c) 

2 = 

j (c) 

2 

2 (w(a) 0.25;2 

 

⌈1 2 (m(a) 2 

⌊1 2 (m(a) 2 

1 = 2 (j(a) 2 

+ w(b) 0.25;2 ) (3.15) 

+ m(b) 2 )⌉ m(a) 2 > m(b) 2 

+ m(b) 

2 

)⌋ m(a) 

2 

≤ m(b) 

2 

(3.16) 

+ j(b) 2 ) (3.17) 

This results in child C having the same number of peaks as the parent A. If a second child, D, is 

required (which is the normal behaviour), it is derived by reversing the roles of A and B in the 

above. 

3.7.5. Objective Function. The objective function, η, uses the peak fit metric, Ω, defined in (3.3): 

η = ΩR + 1 

N ′ p 

N ′ 

p 

 

i=1 

ΩWi 

(3.18) 

where R is the entire convoluted peak region, N ′ p the number of peaks in the experimental region 

(i.e. the initial number before mutation), and Wi the corresponding experimental watersheds 

described above. 

The purpose of this definition is to fit each peak as closely as possible, but also to fit the entire 

region. The latter condition prevents additional peaks being created (unnecessarily) outside any 

individual peak watershed. 

The GA is set to minimise the value of η for the region, corresponding to a good fit. 

3.7.6. Rationalisation Function. After any change to a genome’s variables (mutation) or the creation 

of a new genome (initialisation and crossover), a rationalisation function is called. This 

performs three operations: 

• If two or more peaks in a genome are very close together, they are combined to form a 

single peak. 

• If the multiplet spacing has become very small compared to the peak width, the multiplet 

degree is reset to 1 (indicating no multiplet). 

• If any new peaks are substantially outside the original convoluted peak region, they are 

removed. This is largely for technical reasons as it ensures that the size of matrices involved 

are kept within sensible limits. 

40


3.7.7. Termination Condition. The GA terminates on any of the following conditions: 

• When the best individual in the population satisifies both (a) its regional fit, ΩR in (3.18), 

is below a specified theshold, and (b), the fit metric of every peak within its watershed, 

ΩWi, is below a second threshold. 

• After a set number of generations is reached. 

• Convergence of the best individual, i.e. if over a set number of generations, η for the best 

individual in each generation does not change more than a defined proportion. 

3.8. Technical Implementation. 

3.8.1. matlab and GAlib Combination. The genetic algorithm itself is implemented using a C ++ 

library called GAlib that provides a framework for creating and customising GAs[25]. The manipulation 

required in (a) setting up the genome by identifying peaks and deriving watershed for each 

convoluted peak region, and (b) calculating the fit metric of theoretical peaks to experiment peaks, 

is most appropriately performed in matlab as both stages use matrices representing experimental 

and theoretical spectra. The two tools—GAlib and matlab—are combined using MEX files. 

Having identified a specific convoluted peak region and established the experimental values of 

the genome variables in matlab, a MEX file is called that uses GAlib to instantiate and run the 

algorithm itself. Each time the algorithm evaluates the fit metric for a peak region, the C ++code 

in the MEX file calls back to matlab to run the m-file that implements the metric. 

3.8.2. Custom GAlib Genome Class. The peak shape is implemented as a C ++ class that subclasses 

the genome object in GAlib. The class defines the genome variables. The class methods and 

related functions implement the initialisation, mutation, and crossover operators described above, 

and calls back to matlab to evaluate the objective function. 

3.8.3. GAlib Parameters. To enable setting of standard GAlib parameters, such as: number of 

generations, size of population, crossover rate etc., a single string parameter is passed from matlab 

which is in the format of the command line parameters that would be passed to GAlib running 

as a standalone program. The MEX file tokenises the string into an array of string representing 

each parameter and passed to GAlib as if it were the command line string argument to the main() 

function. 

3.8.4. Efficient Calculation of Fit Metric. The fit metric is called any time a genome is created 

or any of its variables are changed, and so may be evaluated many times during a run of the GA. 

To ensure that the calculation is as efficient as possible, the following implementation is used: 

• The 3D shape of the theoretical peak is calculated once before calling the GA (actually 

once per spectrum before the GA is run individually on each convoluted peak region) and 

stored as a matrix. This avoids time-consuming calculations if the peak is an analytical 

function, such as the Gaussian identified in section 3.1. 6 

The peak is calculated on a discrete grid that is substantially finer than of the experimental 

spectrum itself. When a value on the theoretical peak is required, the closest point 

on the fine grid is calculated and the value returned. 

• Constant values and matrices used in the fitness calculation—such as the theoretical peak 

shape matrix, the region and individual peak watersheds, and the value of the denominator 

in equation (3.3)—are passed to the GA and passed through to the fit metric calculation 

implemented in matlab. 

3.9. Results and Discussion. Figure 25 shows the results of pick peaking by the GA on a 

denoised HSQC spectrum of sucrose. For practical reasons, discussed in section 4, the list of peaks 

is first limited by thresholds on the SNR, but in this example the thresholds are relatively low (4 

times the thermal noise and 3 times the t1-noise) so as to include some artefacts from the t1-noise 

ridges and elsewhere. The threshold values for the fit metric were 0.001 for both the region and 

individual peaks. The GA used a population of size 40 and a maximum of 500 generations. 

6 The method also allows other peak shape to be easily changed for a different analytical function, a peak shaped 

derived experimentally, and/or using different peak shapes in the two spectral dimensions. 

41


Figure 25. Example of pick peaking by genetic algorithm on a section of a 

denoised HSQC spectrum of sucrose. The watersheds of peaks that are fitted 

to the theoretical peak shape by the GA are shown in mid-grey; peaks that are 

excluded by the GA are shown in black. 

The GA has correctly fitted the major peaks. However, even with the small threshold for the fit 

metric (requiring very good fits), the GA has also fitted many other peaks including some in the 

t1-noise streak. For many of the small peaks that the GA fits, it is difficult to distinguish, by eye, a 

difference in shape from the theoretical shape: an example of such a peak is shown in Figure 26(a). 

While some of these small peaks may be ‘genuine’ peaks resulting from low concentration of other 

compounds (contaminants) in the sample, it is suspected that many are simply noise artefacts, 

and that the GA is unable to distinguish between the shape of (some) noise artefacts and ‘genuine’ 

peaks using the criteria listed in section 3.4. 

Nevertheless, the GA has identified some peaks that cannot be fitted with the theoretical peak 

shape. An example of a region containing two such peaks is shown in Figure 26(b), and visually 

it looks distinct from the theoretical peak shape. 

Figure 27(a) shows the results of the GA on a spectrum of a metabolic sample using the same 

parameters as for Figure 25. Here the GA incorrectly determines that some of the larger ‘genuine’ 

peaks are not fitted to the theoretical peak shape. The reason for this is the significantly larger 

number of peaks in the spectrum resulting in regions with many convoluted peaks. This leads to 

a very high dimensional search space in which the GA is unable to fit every peak in the region 

within a practical number of generations. 

This effect can be minimised by modifying the parameters. Firstly, setting a slightly higher 

SNR threshold for peak identification reduces the overall number of peaks considered by the 

GA. Secondly, by increasing the boundary of the watershed as a ratio of the peak height (θw in 

section 3.6), peaks need to be more convoluted before being considered part of the same region, 

reducing the number of peaks per region. Finally, setting a higher threshold on the fit metric 

relaxes the closeness of the fit required. 

42


(a) 

(b) 

Intensity 

Intensity 

5000 

4000 

3000 

2000 

1000 

0 

−1000 

−2000 

6000 

4000 

2000 

0 

−2000 

−4000 

34 

71.5 

33.5 

72 

72.5 

33 

F 1 ( 13 C) / ppm 

F 1 ( 13 C) / ppm 

73 

32.5 

73.5 

32 

74 

5.16 

4.55 

5.18 

4.5 

5.2 

F 2 ( 1 H) / ppm 

4.45 

F 2 ( 1 H) / ppm 

Figure 26. (a) is an example of a small (SNR ≈ 3) peak in a denoised HSQC 

spectrum of sucrose that the GA fitted to the theoretical shape; (b) shows a region 

of two small convoluted peaks (SNR ≈ 3 and 4.5) in the same spectrum that was 

not successfully fitted by the GA. 

Figure 27(b) shows the effect when the SNR threshold is increased (from 3 to 4), the watershed 

boundary is changed from 10% to 20% of peak height, and the fit metric threshold is changed from 

0.001 to 0.01. In this case, many more of the larger ‘genuine’ peaks are fitted within the threshold, 

but the relaxation of the fit metric threshold means that fewer noise artefacts are identified as 

such. 

An alternative solution might be to modify the way in which the algorithm fits each region. 

Instead of attempting to fit all peaks in the region simultaneously, the algorithm could fit the 

largest peaks first and then move on to the smaller peaks. This would reduce the dimensionality 

of the search space at each stage, potentially resulting in better fits in a reasonable number of 

generations for highly convoluted regions. 7 

The results suggest that the genetic algorithm is good for peak picking in relatively simple 

spectra such as sucrose, although it appears too optimistic in that it identifies too many small 

peaks as being ‘genuine’ rather than noise artefacts. 8 Given the similarity in shape between 

case. 

7 This alternative method was not tested owing to time constraints. 

8 A full analysis of each of the small peaks fitted by the GA would be required to be sure whether this was the 

43 

5.22 

4.4


(a) 

(b) 

Figure 27. Example of pick peaking by genetic algorithm on a section of a 

denoised HSQC spectrum of metabolomic sample derived from peas. The watersheds 

of peaks that are fitted to the theoretical peak shape by the GA are shown 

in mid-grey; peaks that are excluded by the GA are shown in black. (a) shows 

the fit using a fit metric threshold of 0.001; (b) shows the fit using a threshold of 

0.01, a higher peak SNR threshold and a modified watershed boundary fraction. 

In can be seen that more of the large peaks are correctly fitted (grey) in (b) than 

(a), but fewer noise artefacts (black) are identified. 

‘genuine’ peaks and some noise artefacts, peak picking based purely on the peak shape might not 

be feasible. (It may, however, be useful in identifying artefacts introduced during denoising if, 

for example, wavelet noise separation were used instead of the direct signal thresholding used for 

these spectra.) 

44


In more complicated metabolomic samples, the GA parameters need to be relaxed in order to 

fit highly convoluted regions within a reasonable number of generations, with the result that very 

few noise artefacts are identified as such. 

It should also be noted that the GA encapsulates only some of the a priori knowledge about 

peak shapes. For example, it considers only simplest multiplet structures rather than the more 

complicated structures resulting from different two or more different spin-spin coupling constants. 

However, by combining the GA peak picking with a more traditional SNR-based peak picking 

method (although modified to be adaptive to t1-noise ridges), some of the shortcomings of the 

GA with regard to metabolomic samples could be ameliorated while retaining the ability to locate 

noise artefacts. This approach is described in the next section. 

45


4. Combined Denoising and Peak Picking Process 

This section describes a process that combines the denoising algorithm with a hybrid genetic 

algorithm / SNR threshold peak picking method. It also describes how the processing derives some 

parameters of the denoising algorithm and peak picking genetic algorithm (GA) from the spectrum 

itself in order to minimise the number of parameters that must be specified to the process. The 

output of the process is a ‘clean’ 2D spectrum free of noise that is suitable for use in the adaptive 

binning analysis described later. 

4.1. Implementation Overview. The processing consists of an ‘umbrella’ matlab script to 

the denoising algorithm and GA peak picking code described in earlier sections. The script calls 

other functions implemented as matlab m-files as well as MEX files (written in C ++). The latter 

(a) load the NMR spectrum from Bruker Topspin processed data files, (b) implement the genetic 

algorithm, and (c) optionally save the processed NMR spectrum back to Bruker Topspin data 

files. The code structure is detailed in appendix E. 

A structure variable is used to control which of the processing steps are performed. This enables 

flexibility in the type of processing, for example skipping the t1-noise denoising step in order to 

compare results with and without denoising. The control structure is also set by the processing 

itself so that if a step should fail, the process restarts at the failed step without repeating earlier 

steps unnecessarily. 

A second structure contains the parameters used by the processing steps, such as the denoising 

threshold multipliers and GA operator parameters. 

Both the control and parameters structure are initially set to default values which are then 

amended by custom functions that are named in variables passed to the processing script. This 

enables reusable ‘parameter sets’, defined as m-file functions, suitable for a range of spectra. 

The output of the processing script includes the matrices representing the original, denoised 

and ‘clean’ spectra and a data structure listing each peak. Each record in the structure contains 

details including the peak location, height (intensity), radii at quarter-height, SNR, the best fit 

metric obtained by the GA, and whether the peak was picked for inclusion in the ‘clean’ spectrum. 

The resulting spectra can optionally be saved to file in the Bruker Topspin data format. 

4.2. Processing Steps. 

1 - Load Spectrum: The data files containing the (F2 real; F1 real) and (F2 imaginary; 

F1 real) parts of the spectrum are loaded from the Bruker Topspin processed data files. 

(At this point, a region of the spectrum, e.g. a subset of the F2 range, can be extracted 

and used for subsequent processing if the entire spectrum is not required. This can limit 

the success of denoising since there are fewer t1-noise ridges to correlate, but is useful for 

analysing a localised feature quickly since a smaller set of data improves performance.) 

2 - Derive Minimum Peak Width: This optional step identifies peaks in the ‘noisy’ spectrum 

(the original spectrum before any denoising) using the technique used later, in step 

5, on the denoised spectrum. However, the SNR threshold used to identify peaks is twice 

that used for the denoised spectrum so as to avoid identifying any noise artefacts. (Note 

this is the identification of local maxima above an SNR threshold, but not the picking 

of peaks, i.e. the distinguishing of small peaks from noise artefacts using the GA.) The 

radii of the peaks are measured and then used to derive the minimum peak widths in the 

F1 and F2 dimensions. These widths are used to set parameters for both the denoising 

algorithm and the GA. 

Alternatively, the minimum peak widths can be set directly from parameters, thus 

avoiding this sometimes time-consuming step. Setting the minimum peak widths directly 

from parameters is appropriate for a set of similar NMR experiments (with the same 

processing parameters), if the widths have been already been derived for one representative 

experiment. 

3 - Separate Noise: This step implements the direct signal thresholding separation of t1noise, 

described in section 2.3.4, on the complex spectrum formed by combining the real 

and imaginary parts. Optionally, wavelet denoising may used as an alternative. The 

46


results are ‘peak’ and ‘noise’ spectra, with the noise spectrum containing the majority of 

the t1-noise and some of the intensity of small peaks convoluted with the noise. 

4 - Apply Denoising Algorithm: The denoising algorithm described in section 2.6 is applied 

to the complex noise spectrum. The peak widths derived in step 2 are used to set 

parameters of the algorithm as shown in Table 1. The result of this step is the ‘denoised’ 

spectrum defined by equation (2.26) consisting of ‘genuine’ peaks with much of the t1-noise 

removed. 

5 - Identify Peaks in Denoised Spectrum: This step is the equivalent of step 2, but 

applied now to the denoised spectrum. Peaks are identified as local maxima in the real 

component of the spectrum, considering all 8 neighbouring data points. 

A peak is not recorded if its intensity is below thresholds defined in terms of the SNR. 

The first threshold, τthermal, considers the SNR measured with respect to the standard 

deviation of the thermal noise, essentially the background noise that—unlike the t1-noise— 

occurs across the entire spectrum. An estimate of the thermal noise is made by taking 

the minimum interquartile range (iqr) of F1 traces considered across the entire F2 range, 

thereby picking a trace does not include t1-noise. The thermal noise of a peak is then 

calculated using this iqr in an amended form of equation (2.28). 

The second threshold, τt1, considers the actual SNR of the peak, calculated by equation 

(2.28). Since this SNR is calculated using the iqr of the F1 trace containing the peak 

maximum, the SNR is calculated with respect to the t1-noise. 

The thresholds set for these two SNR values are designed to discard peaks whose SNR 

is too low for the peak to be realistically distinguished from noise artefacts. Typical values 

for each threshold are τthermal = 4 and τt1 = 3, equivalent to a peak intensities that fall 

within the 99.99% and 99.73% confidence limits, respectively, of the noise distributions, if 

the noise is assumed to be normally distributed in the real spectrum (section 2.3.1). 

The thresholds are implemented partly for performance reasons: without the thresholds, 

the number of peaks considered is extremely large compared to the actual number of 

‘genuine’ peaks and this significantly slows the processing of later steps. It also forms part 

of the hybrid peak picking method, combining the genetic algorithm peak fitting with 

SNR thresholds: it sets a lower limit for SNR. 

In addition, peaks that are within a certain distance of the edges of the spectrum 

are also omitted to exclude artefacts that can result from processing of the NMR spectrum. 

Examples of these artefacts can be seen at the top and bottom of t1-noise ridges in 

Figure 17(a). Since they are not correlated across t1-noise ridges, they can remain after 

denoising as can be seen in part (b) of the same figure. (If wavelet denoising is used to separate 

the t1-noise, this can also introduce edge artefacts owing to artificial discontinuities 

by processing equivalent to ‘wraparound’: see section B.5.4.) 

For the peaks above the thresholds, the location, SNR, and peak radii are measured 

and recorded. In addition, both the full watershed region of the peak, and the watershed 

bounded at a fraction of the peak height, are recorded. The derivation of the latter 

watershed is described in section 3.6, and uses a parameter, θw, to determine the fraction 

of the peak height at which the watershed boundary occurs. 

6 - Identify Convoluted Peak Regions: This step identifies regions of peaks that are 

adjacent to one another, and therefore potentially convoluted. The process is described in 

section 3.6. The output of this step is a set of regions that are isolated from one another 

in the spectrum, within which the peaks are convoluted. 

7 - Fit Peaks Using GA: In this step, the genetic algorithm described in section 3.7 is applied 

to each region identified in the preceding step, in turn. The peak widths determined 

in step 2 are again used to set parameters to the GA, this time setting a lower limit for 

the mutation of widths in the genome. The best fit metric determined by the algorithm 

is stored for each peak. 

8 - Pick Peaks: This step implements a hybrid of the GA and SNR-based peak picking 

methods to overcome some of the practical limitations of the GA peak picking process for 

metabolic samples. 

47


As discussed in section 3.9, a GA using a practical number of generations is sometimes 

unable to fit even the intense peaks in regions consisting of a large number of convoluted 

peaks. Since the largest peaks are clearly ‘genuine’ rather than noise, a pragmatic solution 

is to automatically consider the largest peaks as ‘genuine’ and only consider the fit of the 

smaller peaks when peak picking. 

Thus a further SNR threshold, τinclude, is used for this peak picking. Peaks whose 

SNR—measured against the trace containing the peaks, and therefore accounting for 

changes in t1-noise amplitude—is above the threshold are picked. Peaks below this threshold 

may also be picked if the fit metric determined by the GA is below a threshold, indicating 

a close fit. (Note that the very smallest peaks have already been excluded by the 

two thresholds in step 5.) 

A further lower limit threshold, τexclude, to the SNR may also be applied at this point. 

This is useful when deriving spectra for adaptive binning from reference samples consisting 

of a single compound at high concentration. Since all of peaks from the sample compound 

(rather than contaminants) will have high intensity, this threshold removes all other smaller 

peaks. (Although the lower limit thresholds used in step 4 could be used for this purpose, 

they could remove medium-sized peaks convoluted with the larger sample peaks, making 

it harder for the GA to fit the sample peaks accurately.) 

Optionally, peaks at the F2 and/or F1 frequency of the solvent used for the NMR sample 

may be excluded on the basis they derive from the solvent rather than the sample itself. 

9 - Calculate Peak Volume: For the picked peaks, the volume of the peak—the measure 

of its intensity that takes into account line broadening—is determined by integrating the 

intensity over the area of the peak. Since the spectrum is a discrete signal, this is calculated 

by summing the intensities over the area of the peak’s full watershed region determined 

in step 5. Although this measurement is not required directly for the adaptive binning 

technique, it is often the key datum used when analysing spectra individually. 

10 - Derive ‘Clean’ Spectrum: This step prepares the denoised spectrum for use in the 

adaptive binning analysis by excluding unpicked peaks and any remaining t1-noise. For 

each peak picked, the full watershed area is expanded to include points adjacent to the 

boundary: these are the points that potentially belong to the watershed of more than 

one point, equivalent to the bottom of valleys in the surface of spectrum. This is done 

by including points that are a ‘cityblock’ distance of 1 grid unit from the watershed (see 

section 3.6 where the same technique is used when deriving regions of convoluted peaks). 

Then any point in the spectrum that is not in the ‘expanded’ watershed of a picked peak 

is set to zero intensity. 

11 - Save Spectrum: Optionally, the real and imaginary parts of the resulting spectra may 

be saved to files in the format of Bruker Topspin processed data files. (This is usually 

done for denoised, rather than ‘clean’, spectra in order to evaluate the effectiveness of the 

denoising algorithm using the Bruker Topspin software.) 

4.3. Results and Discussion. In preparation for adaptive binning, the parameters used for 

processing spectra reference samples (consisting of a high concentration of a single compound) are 

chosen to pick only the intense peaks relating to the reference compound itself. For metabolic 

sample spectra, the emphasis is different: parameters are chosen to include small intensity peaks 

from metabolites at low concentration, while minimising t1-noise and other noise artefacts. 

In this context, the use of the GA for peak fitting may be inappropriate since it is unnecessary 

for reference samples—peaks could be identified by large SNR only—and for metabolic samples it 

does not consistently fit peaks in highly convoluted regions. However, the GA is used in the two 

examples below in order to demonstrate the entire process, albeit with a relatively small input to 

the peak picking step. 

Table 2 gives example parameters for deriving the ‘clean’ spectra for reference and metabolic 

samples, incorporating the different emphasis in processing discussed above. 

Figure 28(a) shows the resulting clean spectrum for a reference sample of sucrose; (b) shows the 

signals removed in deriving the clean spectrum (as a result of both the denoising algorithm and 

48


Parameter 

Denoising Algorithm 

Reference Sample Metabolic Sample 

minimum correlation modulus, R (M) 

0.5 

maximum number of traces, N (M) 

3 × data points in minimum F2 peak width 

minimum distance from trace, F (M) 

1 × minimum F2 peak width 

phase-balanced 

Peak Identification 

yes 

minimum SNR compared to thermal noise, τthermal 5 4 

minimum SNR compared to t1-noise, τt1 

Convoluted Peak Region Identification 

5 3 

watershed boundary as fraction of peak height, θw 

Genetic Algorithm 

0.1 0.2 

number of generations 500 500 

population size 

Peak Picking 

40 40 

include SNR above, τinclude 40 6 

maximum fit metric, Ω 0.001 0.02 

exclude t1-noise SNR below, τexclude 20 3 

Table 2. Examples of parameters for deriving ‘clean’ spectra from reference and 

metabolic samples for use in adaptive binning. 

picking peaks). The parameters used where those in the ‘Reference Sample’ column of Table 2. 

Figure 29 shows the results from a metabolic sample derived from pea leaves, using the parameters 

in the ‘Metabolic Sample’ column of the table. 

The results indicate that the process described in this section is effective in producing ‘clean’ 

spectra for both metabolic and reference samples suitable for further analysis by adaptive binning. 

In particular, the ‘clean’ spectrum in Figure 28(a) isolates the intense peaks in the reference 

sample, while the remainder noise signal in Figure 29(b) shows that the process removes noise but 

retains peaks in the metabolic sample: few obviously ‘genuine’ peaks are visible in the remainder 

spectrum. 

A key part of the process is the use of the SNR measured against the t1-noise enabling the 

peak picking to adapt to the local noise amplitude: in regions with little t1-noise, small intensity 

peaks are picked; in regions of t1-noise ridges, the threshold is larger (but smaller peaks are 

still picked if they are closely fitted to the theoretical peak shape by the GA). This is different 

from some traditional peak picking techniques that simply take an absolute intensity threshold 

(or, equivalently, an SNR value measured against the same noise signal for all the peaks) which 

unnecessarily exclude peaks in relatively noise-free regions. 

The GA technique for fitting metrics, combined with SNR thresholds, is likely to be an effective 

peak picking method for the standard analysis (e.g. comparing peak intensities) of individual 

relatively simple spectra. However, as discussed above, it may not be suitable for the processing 

of samples in preparation for adaptive binning where particular considerations of reference and 

metabolic samples apply. 

49


(a) 

(b) 

Figure 28. Spectra resulting from the denoising and peak picking process applied 

to an HSQC spectrum of sucrose. (Only part of the F2 range is shown.) (a) 

shows the ‘clean’ spectrum consisting of picked peaks only; (b) is the remaining 

noise signal. 

50


(a) 

(b) 

Figure 29. Spectra resulting from the denoising and peak picking process applied 

to an HSQC spectrum of pea leaf metabolites. (Only part of the F2 range 

is shown.) (a) shows the ‘clean’ spectrum consisting of picked peaks only; (b) is 

the remaining noise signal. 

51


5. Two-Dimensional Adaptive Binning 

This section describes the application of ‘adaptive binning’ to analyse the composition of a 

metabolic sample using two-dimensional NMR spectra, making use of the ‘clean’ (denoised and 

peak-picked) spectra resulting from the process described in the previous section. 

5.1. Overview of One-Dimensional Adaptive Binning. Direct comparison of spectra is not 

always possible since the resonant frequency of individual peaks can shift owing to factors including 

differences in the pH and temperature of sample. Binning groups the data into equal-sized ‘bins’ 

or ‘buckets’ so that a specific peaks falls within the same bin at all observed shifts. However, 

binning takes no account of the actual distribution of the peaks in spectra and this can limit the 

ability to resolve subtle differences in spectra. Adaptive Binning, described in [6] in the context 

of one-dimensional 1 H NMR spectra of metabolic samples, overcomes some of the limitations of 

binning. 

The starting point is a set of spectra derived from equivalent metabolic samples. The spectra 

are combined by taking the maximum intensity at each frequency across the entire set of spectra. 

The resulting ‘combined’ spectrum has maxima at frequencies matching a peak in one of more of 

the original spectra. 

However, peaks from the same compound that have shifted more than the resolution of the NMR 

experiment, owing to pH or temperature differences, will appear as separate, but closely grouped 

peaks in the combined spectrum. The combined spectrum is smoothed, using non-decimating 

wavelet smoothing, to merge these peaks into a single peak. The level of wavelet decomposition 

to effect suitable smoothing is dependent on the width of the peaks and the degree of shift in the 

samples. 

Boundaries of bins are identified in the smoothed spectrum by locating the local intensity 

minima. The process is termed ‘adaptive’ in that the resulting bin size changes according to the 

local shape of the spectrum instead of being a fixed size. 

Once the bins have been identified, the intensity of each sample spectrum is integrated within 

each bin to give a series of data points for each spectrum. The data points may then be evaluated 

using multivariate data analysis methods, such as principal component analysis[6], or natural 

computational techniques such as genetic programming[7]. 

5.2. Objective for Two-Dimensional Adaptive Binning Research. The purpose of the 

research on adaptive binning described here was to evaluate methods of implementing the technique 

in 2D NMR spectra. Given the time taken to acquire detailed 2D NMR spectra, creating a large 

set of 2D NMR from equivalent metabolic samples was not feasible. Instead, a modified objective 

was designed to test the method: to compare a metabolic sample against spectra of a reference 

compound in order to assess the presence of the reference compound in the metabolome. 

5.3. Two-Dimensional Adaptive Binning Method. 

5.3.1. Sample and Reference Spectra. The data used were a 2D HSQC spectra from metabolomic 

sample derived from pea leaves, and a set of reference spectra of sucrose at different pHs. For 

ease of comparison, all the spectra were measured using equivalent acquisition and processing 

parameters. 

The spectra were firstly ‘cleaned’ to remove t1-noise and other noise artefacts as described 

in section 4. The processing parameters of the metabolomic sample was chosen to retain small 

intensity peaks, while for the reference sample, the processing removed all but the high intensity 

peaks relating to the reference compound itself. The parameters and processing are those described 

in section 4.3 and Table 2, except that the SNR thresholds for the reference spectra were multiplied 

by a factor of 10 since these spectra were acquired using gradient-selected HSQC which introduces 

less noise. 

5.3.2. ‘Combined’ Spectrum. Denoting the 2D metabolomic spectrum as Φ 0 , and the N reference 

spectra as Φ i , where i = 1, ..., N, the ‘combined’ spectrum is derived by taking the maximum 

52


intensity at each discrete frequency point in the two dimensions, i.e.: 

Φ(f1, f2) = max Φ 0 (f1, f2), Φ 1 (f1, f2), Φ 2 (f1, f2), . . .,Φ N (f1, f2) 

where Φ(f1, f2) is the combined spectrum. 

Figure 30(a) shows a small region of a combined spectrum. The combined spectrum was derived 

using two reference spectra at different pHs. In each of the two reference spectra shown in (b) and 

(c), two major peaks can be seen in this region, with peaks in one spectrum visibly shifted with 

respect to the other in the F2 ( 1 H) direction. In the combined spectrum, the peaks give rise to 

two sets of adjacent intense peaks. 

5.3.3. Smoothed Spectrum. The smooth spectrum is derived using the wavelet smoothing, applied 

to the 2D spectra and using the non-decimating transform. (See sections B.7.3, B.8 and B.9.) 

The wavelet chosen for smoothing was the Haar wavelet (section B.6.2) since this wavelet is 

potentially faster to process than other wavelets given its simple form. Performance was an important 

factor since the non-decimating transform to level m requires 2 2m separate applications 

of the 2D pyramid algorithm (section B.9). Since the non-decimating transform tends to average 

out artefacts resulting from poor approximation of the signal by the wavelet shape[20], the 

discontinuous shape of the Haar wavelet is not as important. 

It was found, by experiment, that using wavelet decomposition to level 3 gave the best results in 

merging shifted peaks into a single peak while still retaining separate regions for unrelated peaks. 

The decomposition level will depend on the peak width (related to the resolution of the NMR 

experiment), and the degree of shift in each reference sample. 

Figure 31 shows smoothed spectrum for the same region that was shown in Figure 30. It can 

be seen that the smoothing has merged the two sets of shifted peaks into two distinct peaks. 

5.3.4. Bin Derivation. In the 1D context, bins are identified from the smoothed spectrum by 

locating the minima either side of a peak: the minima are the ends of the bin. In the 2D case, 

an equivalent process is to derive the watershed of the peaks in the smoothed spectrum, and the 

bin is then the area covered by the watershed. (The derivation of watersheds is discussed in more 

detail in section 3.6.) 

If the adaptive binning process were being used for locating distinguishing markers in a set of 

2D metabolomic spectra, all the watersheds (for peaks above a given intensity threshold) would 

be considered as bins. In this example of identifying a reference compound in a metabolic sample, 

only the bins associated with the reference compound are relevant. The condition used is to only 

consider a bin/watershed if a peak from one or more of the original ‘clean’ reference spectra is 

located in the watershed. 

Figure 32 shows the watershed bins derived from the pea leaf metabolic spectrum with two 

sucrose reference spectra. The peaks shown in Figures 30 and 31 result in bins numbered 8 and 

10. 

5.3.5. Intensity Comparison. To compare samples, the intensity of the metabolic spectrum is 

integrated over each of the watershed bin regions. The same is performed for one (or more) of 

the reference spectra. The ratio of the integrated intensities (equivalent to the ‘volume’ of peaks 

in the bin) for each bin is a measure of the concentration of the compound in metabolite sample 

compared to the reference sample (assuming equivalent experiments in each case). It is an upper 

limit on the concentration since nearby peaks from other compounds in the metabolic spectrum 

may have been included in the bin. 

5.4. Results and Discussion. Table 3 shows the integrated intensities over each of the bins for 

the pea spectrum and the two reference sucrose spectra. A direct comparison of concentration 

of sucrose in the metabolic and reference samples cannot be made in this case since different 

experimental techniques were used (phase-cycled and gradient-selected respectively). The ratios 

of integrated intensities (the fourth column in the table) range from 1.7% to 6.6%; ideally a more 

consistent ratio would be expected, although other nearby peaks, not related to sucrose, in the 

metabolic spectrum may have increased some ratios. Nevertheless, a qualitative interpretation is 

53 

(5.1)


(a) 

(b) 

(c) 

Intensity 

Intensity 

Intensity 

8 

7 

6 

5 

4 

3 

2 

1 

0 

−1 

8 

7 

6 

5 

4 

3 

2 

1 

0 

−1 

8 

7 

6 

5 

4 

3 

2 

1 

0 

−1 

x 10 7 

3.6 

x 10 7 

3.6 

x 10 7 

3.6 

3.55 

3.5 

3.5 

3.5 

3.45 

F 2 ( 1 H) / ppm 

3.4 

F 2 ( 1 H) / ppm 

3.4 

F 2 ( 1 H) / ppm 

3.4 

3.35 

3.3 

3.3 

72 

72 

72 

70 

70 

70 

68 

F 1 ( 13 C) / ppm 

68 

F 1 ( 13 C) / ppm 

68 

F 1 ( 13 C) / ppm 

Figure 30. (a) shows a small region of the combined spectrum from a metabolic 

sample (derived from pea leaves) and two sucrose reference samples. The reference 

samples, shown in (b) and (c), were a neutral and acidic pH respectively. (The 

orientation of the axes is non-standard in order to clarify the view of the peaks.) 

54


Intensity 

3 

2.5 

2 

1.5 

1 

0.5 

0 

x 10 7 

3.6 

3.55 

3.5 

3.45 

F 2 ( 1 H) / ppm 

3.4 

3.35 

72 

70 

68 

F 1 ( 13 C) / ppm 

Figure 31. A small region of the smoothed spectrum derived from spectra of a 

metabolic sample and two sucrose reference samples. 

Figure 32. The watershed ‘bins’ identified in the smoothed spectrum from a pea 

leaf metabolic sample and two sucrose reference sample. Bin have been assigned 

numbers to assist identification. (Only the part of the spectrum containing the 

bins is shown.) 

that the presence of significant intensity in each of the ten bins suggests the presence of sucrose 

in the metabolic sample. 

The last column of Table 3 is the ratio of integrated intensities for the two reference samples. 

It is used here as a check on the adaptive binning technique: the ratios would be expected to be 

consistent if the technique correctly identified bins that accounted for the shift in peaks owing 

to sample pH and temperature. While the ratios are significantly more consistent than between 

the metabolite and reference spectra, there is still some variation, with the ratios varying between 

75.3% to 90.5%. This suggests further investigation of the parameters to the adaptive binning 

process—such as the level of wavelet decomposition for smoothing or whether to truncate watersheds 

at a certain proportion of the peak height (described in section 3.6)—is required to produce 

more consistent ratios between the reference spectra. 

It should also be noted that this test of the process used only two reference samples. Normally, 

a larger set of reference spectra with different peak shifts would be used. Since the peaks in 

additional spectra would be located between the shifted peaks in the neutral and acidic sucrose 

55


Intensity Integrated over Bin / 10 8 units Ratio of Ratio of 

Bin Sucrose Sucrose Pea Pea to Sucrose (Acidic) to 

(Neutral) (Acidic) Metabolites Sucrose (Neutral) Sucrose (Neutral) 

1 40.14 31.82 1.432 3.57% 79.3% 

2 33.03 26.58 2.186 6.62% 80.5% 

3 35.77 26.92 2.019 5.65% 75.3% 

4 29.25 26.29 0.534 1.83% 89.9% 

5 26.60 22.15 0.711 2.67% 83.3% 

6 48.91 41.06 1.232 2.52% 83.9% 

7 23.43 21.02 0.401 1.71% 89.7% 

8 22.15 18.50 1.242 5.61% 83.5% 

9 24.01 21.07 0.773 3.22% 87.8% 

10 21.69 19.63 0.846 3.90% 90.5% 

Table 3. The integrated intensity over each bin for the metabolic spectrum and 

two reference spectra of sucrose, and the intensity ratios between spectra. The 

bin numbers are those assigned in Figure 32. 

‘Unprocessed’ Pea Intensity Ratio to 

Bin Integrated over Bin / 10 8 units Sucrose (Neutral) 

1 1.967 4.90% 

2 2.212 6.70% 

3 2.562 7.16% 

4 0.513 1.75% 

5 0.785 2.95% 

6 1.895 3.87% 

7 0.728 3.11% 

8 1.332 6.01% 

9 0.852 3.55% 

10 0.911 4.20% 

Table 4. The integrated intensity over each bin for the unprocessed (without 

denoising and peak picking) metabolic spectrum, and the ratio to the (processed) 

neutral sucrose spectrum. 

spectra used above, a smaller level of smoothing might be possible when identifying bins that 

group together the shifted peaks. In general, smoothing to a smaller decomposition level is likely 

to result in more accurately defined bins (as increased smoothing tends to broaden the shape 

of smoothed peaks), and this in turn might result in more accurate results when intensities are 

compared between bins. 

Finally, Table 4 shows similar calculations using the spectrum of the metabolic pea sample 

without any denoising and peak picking. The integrated intensities show values that are generally 

larger when compared to the equivalent bins for the processed ‘clean’ pea spectrum given in 

Table 3. Although the decrease in intensities in the processed spectrum is suggestive of the effect 

that motivated the use of phase-balanced masking criteria (section 2.6.2), it may also be caused by 

the general decrease in t1-noise included in the bins: further investigation is required to establish 

the reason. The range of ratios is slightly larger for the unprocessed spectrum at 1.7% to 7.2%, 

although no inference can be made given this limited data set. However, the broadly similar 

results do indicate that, for this case at least, the processing of the spectrum to remove t1-noise 

and peak picks has not introduced significant artefacts nor completely removed small peaks in the 

metabolic spectrum. 

56


6. Conclusion 

6.1. Evaluation of Project Objectives. Considering each of the objectives defined in the introduction: 

Reduction of t1-Noise: The denoising algorithm described in section 2.6 is shown to be 

effective at reducing t1-noise in phase-cycled HSQC spectra, while retaining small ‘genuine’ 

peaks convoluted with t1-noise ridges. In the sucrose/glycine spectra analysed in 

section 2.7, the algorithm reduced the t1-noise by a factor ranging between 2 and 4 depending 

on the F2 location across the ridge, and thereby improved the signal-to-noise ratio 

of small peaks convoluted with the noise by a similar factor. The algorithm showed similar 

improvements in metabolic spectra, with the proviso that, under particular circumstances, 

it can significantly reduce the intensity of small genuine peaks within t1-noise ridges. Although 

a quantitative comparison was not made, the algorithm appears to have advantages 

over existing t1-noise reduction methods, such as reference deconvolution (section 2.8). 

Automated Peak Picking: Section 3 describes a peak-fitting genetic algorithm (GA) that 

incorporated some of the knowledge used by experimenters during manual peak picking. 

Section 4 evaluated the use of the GA in conjunction with both the denoising algorithm 

and a SNR-based peak picking method that took account of the variation in t1-noise. 

The GA was able to distinguish between genuine peaks and noise artefacts, particularly 

well in case of simple spectra (section 3.9). For metabolic spectra, regions consisting of a 

large number of highly convoluted peaks were not consistently resolved by the GA within 

a practicable number of generations. This was improved to some extent by modifying 

parameters to reduce the number of peaks in each convoluted region, and by using SNRbased 

thresholding to identify the largest peaks. The hybrid process was suitable for 

producing spectra free of noise artefacts for use in adaptive binning, for both reference 

and metabolic spectra. 

Two-Dimensional Adaptive Binning: The test of adaptive binning was to compare a 

metabolic spectrum with reference spectra of a single compound. Section 5 proposed 

(a) the use of 2D non-decimating wavelet smoothing, and, (b) the use of watershed regions 

to define bins, as two-dimensional equivalents of the corresponding methods in one 

dimension. The technique proved effective for identifying the presence of sucrose in a 

metabolic pea sample, although the results suggests that the parameter choices may not 

be optimal. Evaluation with larger number of reference spectra, or directly comparing a 

number of metabolic spectra, is necessary to improve the adaptive binning technique in 

two dimensions. 

6.2. Further Investigation. A number of possibilities for further investigation were identified: 

• Section 2.7.1 describes an alternative derivation of the noise ‘masking’ spectrum by considering 

the correlation independently at each level of the wavelet decomposition of F2 

traces. The resulting reduction in t1-noise was similar to the standard algorithm, and for 

some ridges slightly better. However, it tended to be less consistent, occasionally resulting 

in significantly less noise reduction at certain points in the spectrum. It is possible that 

further investigation of optimum parameters for the wavelet-level mask derivation might 

result in a method that reduces noise more than the standard algorithm and is similarly 

consistent. 

• It may also be possible to separate the noise from small genuine peaks using a different 

method. If each F2 trace in the noise spectrum were both normalised in amplitude and 

phase adjusted (using the argument of its complex correlation to a chosen reference trace), 

the resulting ‘phase-corrected’ surface might show only large scale variation in the noise 

across F2 directions while small ‘genuine’ peaks in the noise spectrum would be small scale 

features. These could be separated using wavelet analysis (applied to the F2 direction), 

discarding the large scale noise while retaining small scale peaks. After reversing the 

normalisation and phase adjustment, the resulting small peak spectrum would be added 

back to the peak spectrum originally separated from the noise. 

57


• A limitation of the GA was that regions consisting of a number of convoluted peak could 

not be consistently fitted to the required degree of accuracy within a reasonable number of 

generations. Section 3.9 suggests an alternative technique: instead of fitting all peaks in 

a region simultaneously, the largest peaks are fitted first moving progressively to smaller 

peaks. By fitting only one peak at a time (while still considering the entire region), this 

alternative method may be able to accurately fit the region in a reasonable time. 

• Section 2.5.6 suggests how the change in the phase angle of the noise with F2 might be 

used to implement higher order phase correction of the entire spectrum. 

58


Appendix A. Pulse Fourier Transform NMR 

Nuclear Magnetic Resonance (NMR) is a spectroscopic technique that provides information 

on the structure of chemical compounds using the magnetic properties of atomic nuclei. This 

appendix describes the theory and method of Pulse Fourier Transform (FT) NMR which is the 

technique used in modern NMR spectrometers. 

A.1. Nuclear Magnetic Moment. 

A.1.1. Magnetic Nuclei. Atomic nuclei possess a vector property known as spin angular momentum, 

denoted here by L. Nuclei where the spin is non-zero are termed magnetic nuclei since the 

the non-zero spin angular momentum gives rise to a magnetic moment, µ. The ratio between 

these two vector properties is the gyromagnetic ratio, γ: 

γ is the same for all nuclei of an isotope, but differs between isotopes. 

µ = γL (A.1) 

A.1.2. Spin Quantum Number and Magnetic Quantum Number. The spin, L, of an individual 

nucleus is a quantised according to another property of the nucleus, the spin quantum number, I. 

The nuclei considered in this project, 1H and 13C, have I = 1 

2 and are referred to as spin-1 2 nuclei. 

The quantisation restricts the magnitude of the spin vector to the value: 

L = I(I + 1) (A.2) 

(where = h/2π and h is Planck’s constant), while the component of L along an arbitrary axis, 

say the z axis, may take only values: 

Lz = m (A.3) 

where m, the magnetic quantum number, takes the values: 

m = −I, −I + 1, −I + 2, . . . , I − 2, I − 1, I (A.4) 

A.1.3. Quantisation in a Magnetic Field. If the nucleus is in an external magnetic field, B0, there 

is an energy associated with the magnetic moment given by: 

E = −µ · B0 

(A.5) 

In addition, the quantisation axis for the spin, and thus for the magnetic moment, is the direction 

of B0. Taking the direction of B0 to be the positive z axis, then from (A.1) and (A.3): 

Using this in (A.5) gives, 

µz = γLz 

= mγ (A.6) 

E = −µzB0 

= −mγB0 

(A.7) 

A.1.4. Resultant Energy Levels. Since the magnetic quantum number, m, takes the values detailed 

in (A.4) which differ by 1, the interaction of the magnetic moment and external magnetic field 

gives rise to a set of energy levels that differ by ∆E = γB0. Assuming thermal equilibrium, 

nuclei will populate these energy levels according to the Boltzmann distribution. For example, for 

spin-1 1 

2 nuclei, there will more nuclei at the lower energy level (m = 2 , assuming a positive value 

of γ) than the higher level (m = −1 2 ).9 

9 Even with the strong magnetic fields used in modern NMR spectrometers, the difference in energy levels is 

small compared to the room temperature, so the excess nuclei at the lowest energy level is only a small proportion 

of the total number of nuclei[14]. 

59


A.1.5. Resonance Frequency. The difference between energy levels relates directly to the resonance 

frequency ν of the nucleus using the relation: 

giving: 

∆E = hν (A.8) 

ν = ∆E 

h 

= γB0 

h 

= γB0 

2π 

(A.9) 

A.2. Pulse NMR. The pulse technique used in modern NMR spectrometers can be described by 

considering the bulk magnetisation of the nuclei in the sample. For this description, it is possible to 

use the classical physics of magnetic moments rather than the quantum mechanics used above[9]. 

A.2.1. Net Magnetic Moment. The sample, in solution, is placed in a strong, static, homogeneous 

magnetic field of the form described above. The slightly higher number of nuclei at lower energy 

levels described above gives rise to a net magnetization in the z direction; if γ is positive then the 

direction is in the same direction as the external magnetic field, along the positive z axis. Owing 

to the quantisation of the magnitude of the spin angular moment given by (A.2), and in turn the 

nuclear magnetic moment, each nucleus has some component of its magnetic moment in the x-y 

plane in addition to the component quantised along the z axis. However, in the external magnetic 

field acting in the z direction, there is no preferred direction for these components in the x-y plane 

and so there is no net magnetic moment in this plane. Thus (and again assuming a positive γ), the 

net magnetic moment of the nuclei in the sample can be considered to be a vector in the positive 

z direction, in the same direction as the external magnetic field B0. 

A.2.2. Effect of Radio Frequency Pulse. A magnetic field, B1, is applied perpendicular to the 

static B0 field using a short radio frequency (RF) pulse in a coil that surrounds the sample. The 

magnetic field created by the pulse pulse is very much weaker than B0, and oscillates at the 

resonance frequency of the nuclei being observed. 10 

It is convenient at this point to consider a coordinate system rotating about the z axis at the 

same frequency as the pulse. The axes in this rotating frame will be denoted x ′ , y ′ and z ′ . In this 

new coordinate system, the RF magnetic field, B1 is static, and, without loss of generality, it is 

taken that the direction lies along the positive x ′ axis. 

The interaction of the net magnetic moment and the overall magnetic field creates a torque. 

The effect on the (spin) angular momentum is given by: 

dΛ 

dt 

= M × B (A.10) 

where Λ denotes the net angular momentum, and M the net magnetic moment, and B the overall 

magnetic field. 

Since M = γΛ by extension of (A.1), 

dM 

dt 

= γM × B (A.11) 

When considered in a rotating frame at angular velocity of ω, this becomes[9]: 

∂M 

∂t 

= M × (γB − ω) (A.12) 

10 As described below, the nuclei being observed will have a range of the resonance frequencies, but if the 

difference from the frequency of the RF field is small compared to γB1, the effect of the pulse, as described below, 

remains a good approximation[9]. 

60


(a) (b) 

(c) (d) 

Figure 33. The orientation of the bulk magnetic moment, M, and the effective 

magnetic field B (a) at the start of the RF pulse; (b) at the end of the pulse; 

(c) precessing after the pulse; and (d) during relaxation. Note that (a) and (b) 

show the frame rotating with the RF pulse, while (c) and (d) show the stationary 

laboratory frame. 

In the case of the frame rotating at the resonance frequency, ω = 2πνiz = γB0 from (A.9), 

where iz is a unit vector in the positive z direction. During the pulse, the overall magnetic field 

is B0 + B1, giving: 

∂M 

∂t = M × {γ(B0 + B1) − γB0} 

= γM × B1 

(A.13) 

Thus, when considered in the rotating frame, the net magnetic moment is subject to a torque 

that rotates it around the direction of the RF magnetic field. Since the RF magnetic field is 

static along the x ′ axis in the rotating frame, the net magnetic moment rotates around x ′ (in an 

clockwise direction assuming positive γ) in the y ′ -z ′ plane: see Figure 33(a). The duration of the 

RF magnetic field pulse is timed so that at the end of the pulse, the net magnetic moment lies 

along the positive y ′ -axis, i.e. a clockwise rotation of π/2 (Figure 33(b)). 

A.2.3. Precession After RF Pulse. After the pulse, the overall magnetic field is just B0, so returning 

to equation (A.12), 

∂M 

∂t = M × (γB0 − γB0) 

= 0 (A.14) 

Thus the net magnetic moment is stationary in the rotating frame. In the static frame, this means 

the net magnetic moment must be rotating, or precessing, around B0 at the same angular velocity 

61


as the rotating frame (Figure 33(c)). This angular velocity is ω = γB0, and therefore the moment 

precesses at a frequency of: 

ν = ω 

2π 

= γB0 

(A.15) 

2π 

This frequency is identical to the resonance frequency given by the quantum mechanical description 

of a transition between energy levels of an individual nucleus (A.9). The rotating magnetic 

moment gives rise to an oscillating voltage in a receiver coil around the sample, and it is this 

voltage that is detected and processed by the NMR spectrometer. 

A.3. Relaxation. After the RF frequency pulse, the net magnetic moment gradually returns to 

the equilibrium position of alignment along the positive z axis, a process known as relaxation 

(Figure 33(d)). 

A.3.1. Longitudinal Relaxation. There are two separate processes that give rise to this relaxation. 

Firstly, the component of the net magnetic moment along the z axis, Mz following the notation 

used above, returns to its equilibrium value. This is called longitudinal or spin-lattice relaxation. 

In a classical description, this may be viewed as a induced magnetic field occurring in the sample 

as a result of the external magnetic field, B0, where the sample is unmagnetised in the z direction 

immediately after the pulse [9]. In the quantum mechanical description, it may be viewed as 

the population at each of the energy levels returning to the Boltzmann distribution at thermal 

equilibrium. It is assumed [9] that the return to the equilibrium state occurs exponentially, such 

that: 

Mz(t) = Mz(1 − e −t/T1 ) (A.16) 

where Mz is the equilibrium value. The value of T1, the spin-lattice relaxation time, and is 

dependent on the nature of the sample as well as other factors. 

A.3.2. Transverse Relaxation. Correspondingly, the component of the net magnetic moment in 

the x-y plane decreases over time, at least as fast as the magnetic moment in the z direction 

returns. A second process contributes to the reduction in the x-y component. It arises from 

inhomogeneity in the static magnetic field, B0: small differences across the sample means that 

the magnetic moments of individual nuclei precess over a small range of angular velocities, rather 

than the single velocity related to the resonance frequency. Over time, the difference in velocities 

means that the magnetic moments ‘fan out’ and begin to cancel each other, leading to a reduction 

in the net magnetic moment in the x-y plane. This process is called called transverse relaxation. 

The total relaxation—the combination of both transverse and longitudinal processes—in the x-y 

plane is also assumed to decay to zero exponentially with a parameter denoted as T2 [9]. 

A.4. Chemical Shift. While all nuclei of a particular isotope have the same gyromagnetic constant, 

their resonance frequencies will differ depending on the chemical environment of each nucleus. 

Neighbouring nuclei can either increase or decrease the actual magnetic field experienced 

by nucleus through their action on the electrons of the atom in question, leading to a change in 

the resonance frequency called chemical shift. The actual magnetic field experience by the nucleus 

is denoted as B0(1 −σ) where σ is a screening constant. 11 This leads to a change in the resonance 

frequency so that now: 

ν = γB0(1 − σ) 

(A.17) 

The chemical shift is the key datum arising from typical NMR experiments since it provides 

information about the molecular structure near specific nuclei. 

11 Nuclei that are not part of molecule will also experience shielding caused by the electrons within the atom 

itself and this effect is also measured by the shielding constant[14]. 

62 

2π


The chemical shift is normally expressed by comparison with the resonance frequency of the 

same isotope in a reference compound, νref. The difference between the frequencies is expressed 

in parts per million, denoted ppm, so the value is calculated as: 

δ = 10 

6ν − νref 

νref 

(A.18) 

A.5. Spin-Spin Coupling. Neighbouring nuclei can also act directly on the resonance frequency 

of a nucleus, as opposed to chemical shifts which result from indirect action on the electrons. If a 

nucleus, say A, has a neighbouring nucleus, X, which is magnetic (has a non-zero spin), then the 

magnetic moment of X will cause a small magnetic field at A that will either enhance or oppose 

the external magnetic field depending on the direction of the magnetic moment of X. 

The energy associated with the spin-spin coupling is quantified by the spin-spin coupling constant 

between A and X, JAX [9]: 

E = hJAXmAmX 

(A.19) 

where mA and mX are the quantum magnetic numbers for A and X respectively. When considering 

changes in the resonance frequency of A, ∆νA (i.e. for energy level transitions where ∆mA = 1), 

then (A.8) and (A.19) give: 

∆νA = JAXmX 

(A.20) 

For example, if X is a spin-1 2 nucleus, for which mX has two values: 1 

2 or −1 

2 , approximately 

half the X nuclei in the sample will have m = 1 

2 , the other half having m = −1 

2 , leading to a 

doublet: two resonance frequency lines of equal intensity symmetrically arranged above and below 

the resonance frequency of A if spin-spin coupling had not occurred. Since the spacing between 

possible values of mX is 1, the frequency difference between the doublet lines is JAX. 

If A has spin coupling with N neighbouring nuclei, then (A.20) becomes [9]: 

N 

∆νA = 

(A.21) 

k=1 

JAXk mXk 

If all the Xk are ‘equivalent’ nuclei—specifically, when JAXk is the same for all k and each 

mXk takes the same set of possible values—then by considering the number of combinations 

of (mX1, mX2, . . . , mXN) giving the same total for N mXk k=1 , it can be seen that the ratio of 

intensities of the frequencies in the multiplet are binomial coefficients. For example, a triplet will 

have ratios 1:2:1 and a quartet will be 1:3:3:1. If the Xk refer to nuclei that are not equivalent, 

then more complex multiplet patterns can occur. 

A.6. Signal Detection and Processing. 

A.6.1. Detection. The signal in the receiver coil, induced by the rotating net magnetic moment of 

nuclei in the sample as described above, is proportional to the rate of change of magnetisation, so 

immediately after the pulse, the x component of the received signal is at a maximum and then is 

sinusoidal with a period equal to the resonance frequency. This is termed the absorption signal[9]. 

The y component is π/2 out of phase and termed the dispersion signal. The signal is called the 

Free Induction Decay or FID since the amplitude decays over time owing to relaxation in the x-y 

plane. As a result of chemical shifts and spin-spin coupling, the signal measured by the receiver 

coil will normally contain a number of distinct frequencies. 

A.6.2. Reference Signal. The first stage of processing is to ‘mix’ or ‘subtract’ a reference signal, 

essentially a generated signal consisting of a signal frequency below the minimum resonance frequency 

of interest. This leads to a signal that contains not the actual resonance frequencies, but 

the frequency differences between the resonance signals and the generated reference signal. These 

significantly lower frequencies are easier to process. The resulting signal is fed to an analogue-todigital 

converter to allow subsequent processing by computer. 

Figure 34 shows an artificially constructed example of a FID after reference mixing. (The FID 

waveform is a section of 1D 1 H spectrum of glucose, with the decay significantly enhanced to 

illustrate the form of a decaying FID.) 

63


Intensity 

0 

Figure 34. An example of a Free Induction Decay (FID) signal after reference mixing. 

A.6.3. Pre-Fourier Transform Processing. To convert the digital time-domain FID to a frequencydomain 

spectrum, a Fourier transform is used. However, further processing of the signal may occur 

prior to the Fourier transform. Zero-filling may be used to extend a FID that has decayed to zero in 

order to improve resolution[9]. Often the FID may be multiplied by a window function, to change 

the amplitude of the signal over time. For example, the original FID, f(t) may be multiplied by 

a function such as a squared cosine curve (over the first two quadrants) to give a modified FID 

f ′ (t): 

f ′ 2 t 

(t) = cos π f(t) (A.22) 

tN−1 

where tN−1 is the time index of the last data point in the FID. The effect of the window function 

can be to increase the signal-to-noise ratio, improve resolution or modify peak shape[9]. 

A.6.4. Fourier Transform. A discrete Fourier transform is used to convert the time-domain FID 

to a frequency-domain spectrum. If f(t) is the FID of N points indexed 0 to N − 1, then the 

frequency spectrum, F(ν) is given by: 

Time 

F( j 

M νsamp) 

N−1 2πi − 

= f(tk)e N jk 

k=0 

(A.23) 

where νsamp is the sampling frequency of the FID, and M the number of points in the discrete 

Fourier transform, and the index j is between 0 and M/2. 

A.6.5. Phase Correction. The resulting discrete frequency spectrum, F(ν), takes complex values. 

The real and imaginary parts of the spectrum may be interpreted as the absorption and dispersion 

mode signals corresponding to signals that are in phase and π/2 out of phase with the generated 

reference signal. (Figure 35 shows the typical form of absorption and dispersion signals.) 

Ideally, the phase of the generated reference signal should match that of the precessing net 

magnetic moment, but when this does not occur, each part of the complex spectrum may be a 

mixture of absorption and dispersion mode signals. In this case it is necessary to phase correct 

the spectrum so that the real part of the spectrum is the absorption signal. A phase correction of 

θ can be made by transforming the spectrum F(ν) as follows (a compact form of equations given 

in [9]): 

F ′ (ν) = F(ν)e −iθ 

(A.24) 

A.6.6. 1D NMR Spectrum Plot. Normally the real part of the spectrum, containing the absorption 

signal is considered. By convention, the spectrum is plotted with higher frequency, or higher ppm 

values, to the left[14]. Figure 36 shows an example of a 1D spectrum. 

64


(a) 

(b) 

Intensity 

Intensity 

x 106 

10 

8 

6 

4 

2 

0 

−2 

−2 

−4 

−6 

0.08 

x 106 

6 

4 

2 

0 

0.08 

0.06 

0.06 

0.04 

0.04 

0.02 

0 

Frequency ( 1 H) / ppm 

0.02 

0 


Figure 35. Absorption (a) and dispersion (b) signals corresponding to the resonance 

frequency of the reference compound in a 1D 1 H spectrum of glucose. 

Intensity 

x 107 

3.5 

2.5 

1.5 

0.5 

−0.5 

4.2 

A.7. Multi-Dimensional NMR. 

3 

2 

1 

0 

4 

3.8 

3.6 


Figure 36. A section of a 1D 1 H NMR spectrum of glucose. 

A.7.1. Pulse Sequences. In the description of 1D NMR above, a single RF frequency pulse that 

rotates the net magnetic moment by π/2 is applied to the sample and the experiment measures 

the resonance frequency of nuclei in a range of the spectrum. In higher dimensional NMR, a more 

complicated pulse sequence is applied to the sample before a FID is acquired. The pulse sequence 

65 

3.4 

−0.02 

−0.02 

−0.04 

−0.04 

3.2 

−0.06 

−0.06 

3


Intensity 

0 

ν 2 

Figure 37. Representation of a series of FIDs taken over changing t1 values and 

Fourier-transformed in the first dimension to give ν2 frequencies. The diagram 

shows the phase change in the two peaks with t1. 

and FID acquisition are repeated a number of times with one or more parameters that control the 

timing of part of pulse sequence changing on each run. It is these extra parameters that give rise 

to the additional dimensionality of the results: for example, a single timing parameter, t1, in the 

pulse sequence creates a single additional dimension in 2D NMR. 12 

The effect of the t1 timing parameter within the pulse sequence is often to change the initial 

direction of the net magnetic moment in the x-y plane after the final pulse before the FID is 

acquired. The type of pulse sequence applied determines what factors influence the change in the 

initial direction. For example, the direction change may be related to precession owing to the 

resonance frequency of a neighbouring nucleus, X, rather than the nucleus, A, whose resonance 

frequency is measured in the FID. In this case, during t1, the angle of the moment will ‘evolve’ at 

an angular velocity equivalent to the resonance frequency of X, so as t1 increases, the change in 

direction of the moment when the FID begins to be acquired increases in proportion. 

A.7.2. Second Fourier Transform. In processing 2D NMR, each of the series of FIDs is first processed 

as above to create a series of frequency spectra, Ft1(ν). The difference in initial direction 

of net magnetic moment owing to t1 results in each of the frequency spectra having a different 

phase. When the same point is considered across the series of spectra, i.e. the sequence Ft1(ν2) 

for fixed frequency ν2 and varying t1, the phase will change with t1 sinusoidally. 

Figure 37 is a representation of the change in phase with t1. Each 1D spectrum after Fouriertransform 

in the first dimension has two peaks, but the phase of the peaks changes with t1. The 

period of the phase change with respect to t1 is different for each peak and is equivalent to the 

frequency of the peak in the second dimension. 

12 The notation t1 distinguishes the parameter from t2 which denotes the time during which the FID is acquired, 

which itself is related to the constant T2 describing the transverse relaxation decay (see section A.3.2). 

66 

t 1


Figure 38. A section of a 2D gradient-selected 1 H– 13 C HSQC NMR spectrum of glucose. 

To derive this second spectral frequency, a second discrete Fourier transform is applied in turn 

to the sequence Ft1(ν2) (fixed ν2, varying t1) at each ν2 value, to give a 2D frequency spectrum, 

Φ(ν1, ν2). 

A.7.3. 2D NMR Spectrum Plot. The ν2 frequency, i.e. that determined from the FID, is called 

the F2 frequency or ppm and is plotted on the horizontal axis. The frequency derived from the 

change in phase with t1 is called F1 and plotted on the vertical axis. An example of 2D spectrum 

is shown in Figure 38. 

A.7.4. Types of Multi-Dimensional NMR Experiments. A wide variety of 2D, and higher dimensional, 

NMR experiments are possible, characterised by pulse sequences that measure specific 

properties of the molecular structure. For example, in this project the 2D spectra result from 

Heteronuclear Single Quantum Coherence (HSQC) experiments that identifies the resonance frequencies 

of 13 C nuclei connected via a single bond to 1 H. 

A.8. NMR Sensitivity. The sensitivity of an NMR experiment of a measure of its ability to 

distinguish genuine signals from background noise. 

A.8.1. Factors Affecting Sensitivity. A number of factors influence the sensitivity of an NMR 

experiment[9], including: 

• the probe used to detect the rotating net magnetic moment; 

• the signal processing equipment such as amplifiers and analogue-to-digital converters; 

• the stability and homogeneity of the magnetic field; 

• the nature of the experiment itself. 

A.8.2. Signal-Averaging. The effect of noise can be reduced by signal-averaging: two or more 

equivalent spectra are combined resulting in an increase in peak intensity but less of an increase in 

the noise. For random noise, the improvement in sensitivity is √ n for averaging over n spectra[9]. 

Note, however, that the t1-noise described in this project is not entirely random and is not reduced 

to this extent by signal-averaging. Additionally, the length of time taken to acquire 2D spectra 

limits the use of signal-averaging. 

67


A.8.3. Signal-to-Noise Ratio. A standard metric for the sensitivity is the (root-mean-square or 

RMS) Signal-to-Noise Ratio (SNR), calculated as: 

SNR = Φ(p) 

σ (n) 

(A.25) 

where Φ (p) is the intensity of a specific peak and σ (n) is the standard deviation of the noise in the 

neighbourhood of the peak. In some definitions, the denominator is taken to be twice the noise 

standard deviation[9]. 

An alternative measure is the Peak-To-Peak SNR that compares the signal amplitude to the 

maximum amplitude of the noise, rather than its standard deviation. Usually the maximum 

amplitude is assumed to be 2.5 times the standard deviation[9]. 

68


Appendix B. Wavelet Analysis 

The theory of wavelets grew from the desire to study the local frequency composition of localised 

and often noisy time signals that did not have the periodic behaviour required for other techniques, 

such as Fourier analysis. Although the concept of wavelets had arisen in a number of fields during 

the Twentieth Century, the mathematics of modern wavelet analysis is taken to have started with 

the analysis of seismic data in the 1980s [1, 18, 23]. Continuing research, especially in the 1990s, 

has lead to a wide range of applications leveraging wavelet analysis [1]. 

This appendix focuses on the theory underlying the application of wavelets in this project. 

B.1. Continuous Wavelet Transform. 

B.1.1. Square-Integrable Functions. Wavelet analysis operates on functions that are Lebesgue measurable 

and ‘square-integrable’ in terms of the Lebesgue integral. In addition, the wavelets considered 

here operate on functions of the real line. 

Definition B.1 (Square-Integrable). The set L2 (R) of square-integrable functions of one real 

variable 13 is defined as: 

L 2 

∞ 

(R) = f : R → R such that |f(t)| 2 

dt < ∞ 

(B.1) 

where integration is the Lebesgue integral. 

B.1.2. Function Energy and Localisation. The value of ∞ 

−∞ |f(t)|2dt is often termed the energy, 

E, of the function. The condition on a finite energy implies that such functions in L2 (R) are 

localised in the sense that they must ‘decay’ to 0 at ±∞ [5]. The localisation can be quantified by 

considering the mean time, ¯t, as a measure of the function’s ‘centre’, and time standard deviation, 

σt, as a measure of its ‘spread’, defined as follows [23]: 

¯t = 1 

E 

σt 2 = 1 

E 

∞ 

−∞ 

∞ 

−∞ 

−∞ 

t|f(t)| 2 dt (B.2) 

(t − ¯t) 2 |f(t)| 2 dt (B.3) 

where the normalising factor, E, is the energy. The same localisation is true of the frequency 

components of the signal represented by the function. If the Fourier transform of f(t) is denoted 

by ˆ f(ν), where 

ˆf(ν) = 

∞ 

−∞ 

f(t)e −2πiνt dt (B.4) 

then the mean frequency, ¯ν, and frequency standard deviation, σν, are defined as: 

¯ν = 1 

E 

σν 2 = 1 

E 

∞ 

−∞ 

∞ 

−∞ 

ν| ˆ f(ν)| 2 

dν (B.5) 

(ν − ¯ν) 2 | ˆ f(ν)| 2 

dν (B.6) 

Since, by Parseval’s theorem, ∞ 

−∞ | ˆ f(ν)| 2 

dν = ∞ 

−∞ |f(t)|2 dt, the normalising factor is the same. 

B.1.3. Inner Product. To analyse the components of a function f(t) in L2 (R), a sensible approach 

is to compare it to a reference function, say ψ(t), also in L2 (R), and as a measure of similarity, 

use the inner product: 

〈f, ψ〉 = 

∞ 

−∞ 

f(t)ψ(t)dt (B.7) 

Here ψ(t) represents the complex conjugate of ψ(t); for the real functions under consideration at 

this point, this distinction is not relevant. 

13 The variable t is chosen here since in many applications the function represents a signal that varies with time. 

69


B.1.4. Translation and Dilation. Since ψ(t) is a member of L2 (R), it will also be localised in both 

the time and frequency domain as described above, so the inner product as measure of similarity 

will be restricted to a range of times and frequencies characteristic of ψ(t). To be useful as a 

method to analyse the composition of the entire signal f(t), it is therefore necessary to ‘move’ 

ψ(t) in both domains. This is achieved by means of a translation by a factor b ∈ R, and dilation 

by a factor a ∈ R (a = 0) to produce the family of functions: 

ψ (a,b)(t) = 1 

 

t − b 

ψ . (B.8) 

|a| a 

The factor 1/ |a| is introduced for convenience as it ensures that the energy is the same for all 

values of a. 

B.1.5. Wavelets and Continuous Wavelet Transform. If the mean time and frequency of ψ(t) are 

¯tψ and ¯νψ, by applying (B.2) and (B.5) to ψ (a,b)(t) and its Fourier transform, it can be seen that 

the mean time and mean frequency are now ¯tψ + b and ¯νψ/a. Thus the inner product of f(t) and 

ψ (a,b)(t) is a measure of frequency components centred on ¯νψ/a near the time location ¯tψ + b. By 

varying a and b, this allows the local analysis of frequency components of f(t) at the entire range 

of times. This approach by formalised by defining the reference function as a wavelet, and the 

inner product with the wavelet as the continuous wavelet transform. 

Definition B.2 (Wavelet). A function ψ(t) ∈ L2 (R) is a wavelet, if it satisfies the admissibility 

condition: 

∞ 

| 

Cψ = 

ˆ f(ν)| 2 

dν < ∞ 

ˆf(ν) 

(B.9) 

The value Cψ is the admissibility constant. 14 

−∞ 

Definition B.3 (Continuous Wavelet Transform). If f(t) ∈ L2 (R) and ψ(t) is a wavelet, then 

(Wψf), the Continuous Wavelet Transform (CWT) of f(t), is defined as: 

 

1 t − b 

(Wψf)(a, b) = f, ψ 

|a| a 

 

(B.10) 

One consequence of the admissibility condition is that ˆ f(0) must be 0 for Cψ to be finite. Using 

(B.4), 

0 = ˆ f(0) 

= 

= 

∞ 

−∞ 

∞ 

−∞ 

f(t)e −2πi0t dt 

f(t)dt (B.11) 

Thus, a wavelet has a mean value of zero. 

Note that application of (B.3) and (B.6) shows that the time and frequency standard deviations 

of ψ (a,b)(t) are a¯tψ and ¯νψ/a respectively, where ¯tψ and ¯νψ are the corresponding values for ψ(t). 

This means that as the frequency standard deviation decreases, i.e. the CWT measures a tighter 

range of frequencies, the time standard deviation of increases meaning that the measurement 

is less localised in the time-domain, and vice versa. This adaptive behaviour distinguishes the 

behaviour of wavelets from the short-time Fourier transform where the ‘resolutions’ in the time 

and frequency domains remain constant, and makes wavelet analysis more sensitive to rapidly 

changing signals[23]. 

To visualise the results of the CWT, the values of (Wψf)(a, b) or |(Wψf)(a, b)| 2 are plotted (the 

latter being termed a scalogram[23]), with a on the vertical axis, sometimes using a logarithmic 

scale, and b on the horizontal axis. 

14 Complex wavelets have an additional constraint on the form of ˆ f [1]. 

70


ψ(t) 

1 

0.5 

0 

−0.5 

−5 −4 −3 −2 −1 0 1 2 3 4 5 

Figure 39. The Mexican Hat wavelet 

B.1.6. CWT Example. Figure 39 shows an example of a wavelet function: the ‘Mexican Hat’ 

wavelet defined as [1]: 

ψ(t) = 

t 

2 

√ 

4 3 √ π (1 − t2 t2 − 

)e 2 (B.12) 

An example of CWT using the Mexican Hat wavelet is shown in Figure 40. The signal consists 

of two sinusoidal waves of different periods and changing amplitudes. The pseudocolour plot of 

the CWT for different values of the dilation factor, or scale, a, and translation factor, or position, 

b. The plot is coloured according to the value of |(Wψf)(a, b)|, with higher magnitudes plotted 

using lighter shades. 

The change in dominant frequency with position can be seen in the CWT map, and the two 

separate frequency components can be cleary distinguished even at locations where they are convoluted 

together in the signal. The periodic nature of the signals is shown by the regular pattern 

of light and dark bands as the position changes. The light bands correspond to locations where the 

dilated and translated wavelet matches the signal closely, effectively ‘in phase’, or π out of phase, 

with the signal, resulting in a large magnitude of the inner product. Since the shades represent 

the magnitude rather than value of the CWT, the map does not distinguish between in phase 

and π out of phase. Conversely, the dark bands are where the signal and translated wavelet are 

out of phase by π/2 or 3π/2, resulting in a small magnitude for the inner product. The change 

in component amplitude with position is reflected in the changes in relative brightness of the 

light bands: the brightest parts of the map—at approximately (a = 16, b = 275) and (a = 6, 

b = 400)—correspond to the location of the maximum amplitude of each of the two frequency 

components. 

B.1.7. Inverse Wavelet Transform. The CWT could be performed using any function ψ(t) ∈ 

L 2 (R), even if it does not satisfy the admissibility condition. However, the finite value of Cψ for 

the wavelet function enables an inverse to the CWT. 

Definition B.4 (Inverse Wavelet Transform). If the (Wψf)(a, b) is the Continuous Wavelet Transform 

of f(t), then f(t) can be reconstructed from (Wψf) using the Inverse Wavelet Transform: 

f(t) = 1 

Cψ 

∞ ∞ 

where ψ (a,b)(t) is the translated and dilated wavelet defined by (B.8) 

−∞ 

−∞ 

A derivation of the Inverse Wavelet Transform is given in [5]. 

(Wψf)(a, b)ψ (a,b)(t) da 

db (B.13) 

a2 B.2. Discrete Wavelet Transform. Although the Inverse Wavelet Transform can be used to 

reconstruct the signal from the CWT, there is a lot of redundancy in the CWT, in the sense that 

much information about the original signal, f(t), carried in a particular value (Wψf)(a ′ , b ′ ) is 

also carried by ‘nearby’ values such as (Wψf)(a ′ + δa, b ′ + δb). Instead of the redundant CWT 

representation, many wavelet applications make use of a wavelet representation of signal that 

samples the CWT using discrete values of a and b. 

71


(a) 

(b) 

(c) 

(d) 

f 1 (t) 

f 2 (t) 

f(t) 

Scale (a) 

1 

0.5 

0 

−0.5 

−1 

0 100 200 300 400 500 600 700 

1 

0.5 

0 

−0.5 

−1 

0 100 200 300 400 500 600 700 

1 

0.5 

0 

−0.5 

−1 

0 100 200 300 400 500 600 700 

31 

29 

27 

25 

23 

21 

19 

17 

15 

13 

11 

9 

7 

5 

3 

1 

t 

t 

t 

100 200 300 

Position (b) 

400 500 600 

Figure 40. An example of CWT analysis of a periodic signal. The components 

of the signal are shown in (a) and (b): sinusoidal waves of period 50 and 20, 

π/2 out of phase with one other and with amplitudes modulated by a Gaussian. 

The combined signal is shown in (c). (d) is a pseudocolour plot the CWT of the 

combined signal using the Mexican Hat wavelet; values with a greater magnitude 

are shown in lighter shades. 

72


B.2.1. Dyadic Grid. Typically a dyadic grid is used that performs a logarithmic discretisation of 

both the dilation and translation parameters, such that: 

a = 2 m 

b = 2 m nb0 where m, n ∈ Z, b0 > 0 (B.14) 

The value b0 is termed the sampling rate. 

For simplicity, b0 is often taken to be 1. The notation ψm,n will be used the translated and 

dilated wavelet at the location on the grid defined by m and n at the sampling rate of 1. (The 

lack of parentheses is used to distinguish this notation from that used in (B.8).) 

m − 

ψm,n(t) = ψ (2m ,2mn)(t) = 2 2 ψ 

B.2.2. Discrete Wavelet Transform. 

m t − 2 n 

2 m 

m − 

= 2 2 ψ 2 −m t − n 

(B.15) 

Definition B.5 (Wavelet Coefficients and Discrete Wavelet Transform). The value dm,n of the 

CWT at the position in the dyadic grid defined by m and n is termed the wavelet coefficient or 

detail coefficient. The transform from f(t) to wavelet coefficients is the Discrete Wavelet Transform 

(DWT) 15 

dm,n = 〈f(t), ψm,n(t)〉 

m − 

= 2 2 

∞ 

−∞ 

f(t)ψ 2 −m t − n dt (B.16) 

B.2.3. Stability Condition. For this representation as wavelet coefficients to be useful, the family 

of wavelet functions, {ψm,n(t)}, must meet a further condition, termed the stability condition: 

Definition B.6 (Stability Condition). The family of wavelet functions {ψm,n(t)} generated from 

the wavelet ψ(t) by (B.15) satisfy the stability condition if there exists constants A and B, 0 < 

A ≤ B < ∞, such that, 

AE ≤ 

|dm,n| 2 ≤ BE ∀f(t) ∈ L 2 (R) (B.17) 

m,n∈Z 

where dm,n are the wavelet coefficients defined in (B.16), and E is the energy of f(t). 

In simple terms, the stability condition ensures for functions that are ‘close’ in L 2 (R) have 

representations in terms of wavelet coefficients that are also ‘close’, and vice versa [23]. More 

accurately, the stability condition ensures that the wavelet family {ψm,n(t)} is a frame of L 2 (R) 

[5]. If the constants in (B.17) are such that A = B then the frame is termed a tight frame. 

B.2.4. Inverse Discrete Wavelet Transform. A result arising from the general theory of frames 

[23] is that the original function f(t) can be reconstructed from its wavelet coefficient as follows. 

Definition B.7 (Inverse Discrete Wavelet Transform). If dm,n are the wavelet coefficients given 

by the CWT, then the original function f(t) can be reconstructed as: 

∞ ∞ 

f(t) = dm,n ψm,n(t) (B.18) 

m=−∞ n=−∞ 

where the (non-unique) functions ψm,n are termed the dual functions for the frame {ψm,n(t)}. 

A simplification exists for tight frames, where the functions { 1 

Aψm,n(t)} (A is the constant in 

the stability condition (B.17)) are a suitable choice for the dual functions. In the case where 

A = B = 1, this gives: 

∞ ∞ 

f(t) = dm,nψm,n(t) (B.19) 

m=−∞ n=−∞ 

15 The term Discrete Wavelet Transform may instead be used to refer to the Fast Wavelet Transform described 

below, and in particular, the Pyramid Algorithm for discrete time signals [18]. 

73


B.2.5. Wavelet basis of L 2 (R). The form of (B.19) suggests that {ψm,n(t)} is a basis for the space 

L 2 (R). (A detailed treatment in terms of Riesz bases is given by [5].) In the case where A = B = 1 

the basis is orthogonal [23], i.e.: 

〈ψm,n, ψm ′ ,n ′〉 = 

 

E m = m ′ and n = n ′ 

0 otherwise 

(B.20) 

where E is the energy of ψm,n. Usually a normalizing factor is chosen for the wavelet, ψ(t), and 

the wavelet family, {ψm,n(t)} so that the energy is 1, and the family forms an orthonormal basis. 

A further consequence of choosing wavelet functions such that A = B = 1 is that the wavelet 

coefficients generated by the DWT represent the original signal with no redundancy [23]. 

B.3. Scaling Functions. The basis of L 2 (R) formed by the family {ψm,n} can be considered as 

a direct sum of subspaces L 2 (R) spanned by the subsets of the form Ψm = {ψm,n : n ∈ Z}, i.e. 

those functions having the same value of m (the same dilation) [5]. If subspace formed by the 

(closure in L 2 (R)) of the linear span of functions in Ψm is denoted Wm, then: 

L 2 (R) = 

∞ 

k=−∞ 

It is useful to consider the sequence of subspaces Vm defined by: 

Vm = 

∞ 

k=m+1 

Wk 

Wk 

(B.21) 

(B.22) 

The analysis of these subspaces gives rise to scaling functions from which wavelets can be constructed. 

In addition, scaling functions are used in a fast algorithm for calculating the DWT of 

discrete functions, described below. 

B.3.1. Scaling Function and Approximation Coefficients. The scaling function, φ(t), is defined 

here in terms of the subspaces Vm. 

Definition B.8 (Scaling Function). If φ(t) is translated and dilated in an equivalent form to the 

wavelet on the dyadic grid, to define: 

m − 

φm,n(t) = 2 2 φ 2 −m t − n 

(B.23) 

then φ(t) is a scaling function if the set Φm = {φm,n : n ∈ Z} is an orthonormal basis of Vm. In 

addition, the scaling function is usually normalised as follows [1, 24]. Note that this normalisation 

considers the function itself rather than its energy. 

∞ 

−∞ 

φ(t)dt = 1 (B.24) 

(The properties of the scaling function, together with the properties of the subspaces Vm, 

constitute a Multiresolution Analysis [23].) 

Since {φm,n} is a basis for Vm, s(t) ∈ Vm may decomposed as: 

s(t) = 

where the coefficients sm,n are given by: 

and termed approximation coefficients. 

∞ 

n=−∞ 

sm,nφm,n(t) (B.25) 

sm,n = 〈f, φm,n〉 (B.26) 

74


B.3.2. Decomposition of L 2 (R) by Wavelet and Scaling Functions. From the definition of Vm in 

(B.22) and the decomposition of L 2 (R) in (B.21), for a specific value m = m ′ , 

L 2 (R) = Vm ′ ⊕ Wm ′ ⊕ Wm ′ −1 ⊕ Wm ′ −2 ⊕ · · · (B.27) 

This suggest that any function, f(t) ∈ L2 (R), may be written as a function s(t) ∈ Vm ′, plus the 

combination of wavelet functions that form the basis for Wm ′ and lower subspaces (corresponding 

to dilated wavelets with increasingly higher mean frequencies): 

f(t) = 

∞ 

n=−∞ 

sm ′ ,nφm ′ ,n(t) + 

∞ 

∞ 

m=m ′ n=−∞ 

dm,nψm ′ ,n(t) (B.28) 

As a consequence of (B.24), ∞ 

−∞ φm,n(t)dt = 1, and so the approximation coefficients can 

be interpreted as a weighted average of f(t) given by a particular translation of the φ(t) (and 

dilated corresponding to Vm) [1]. The function s(t) therefore gives an approximation to the 

original function f(t), with the difference between this approximation and the original function 

represented by the series of wavelet coefficients at levels m ′ , m ′ +1, m ′ +2, . . .. As m ′ decreases the 

approximation reconstructed from the approximation coefficients becomes coarser as the resolution 

of the scaling functions in Φm is dilated. 

B.3.3. Scaling Equation. From the construction in (B.22) it can be seen that the subspaces Vm 

are nested: 

· · · ⊂ Vm+1 ⊂ Vm ⊂ Vm−1 ⊂ · · · (B.29) 

As it is a member of the basis of Vm, φm,0 ∈ Φm, is a member of Vm. Since Vm ⊂ Vm−1, φm,0 can 

be written in terms of the basis of Vm−1, i.e.: 

φm,0(t) = 

k 

c ′ kφm−1,k 

(B.30) 

When written in terms of the original scaling function φ(t), equivalent to φ0,0, (and modifying the 

coefficients c ′ k by a factor of √ 2) this gives the following relation: 

Definition B.9 (Scaling Equation and Scaling Coefficients). The scaling function φ(t) is related 

to translated and dilated versions of itself by the scaling equation or dilution equation 

φ(t) = 

ckφ(2t − k) (B.31) 

where the ck are the scaling coefficients. 

k 

A key result of this approach is that the same coefficients can be used to construct the related 

wavelet function as follows [1]. 16 

ψ(t) = 

(−1) k c1−kφ(2t − k) (B.32) 

k 

B.4. Fast Wavelet Transform. The relationship between the wavelets, scaling functions and 

scaling coefficients defined in equations (B.32) and (B.31) lead to a fast, recursive algorithm for 

determining the approximation and detail (or wavelet) coefficients for f(t), derived as follows. 

16 A thorough derivation of this result is given in [5] which also notes that the general result corresponding to 

this equation uses the complex conjugate of c1−k. 

75


B.4.1. Forward Fast Wavelet Transform. From (B.31) and the definition of φm,n in (B.23), 

φm,n(t) = 1 

√ ckφm−1,2n+k 

(B.33) 

2 

Using this in (B.26), 

sm,n = 

 

= 1 

√ 2 

= 1 

√ 2 

= 1 

√ 2 

f, 1 

√ 2 

 

k 

 

k 

 

k 

k 

 

k 

ckφm−1,2n+k 

〈f, ckφm−1,2n+k〉 

ck 〈f, φm−1,2n+k〉 

cksm−1,2n+k 

 

(B.34) 

This relationship allows the approximation coefficients at level m to be derived, using only the 

scaling coefficients, from the approximation coefficients at the level m−1 (which represents a finer 

approximation to the signal f(t)). 

The equivalent manipulation of (B.32) and definition of ψm,n in (B.15) gives: 

ψm,n(t) = 1 

√ (−1) 

2 

k c1−kφm−1,2n+k 

(B.35) 

and the detail coefficient at level m given by (B.16) to be written as: 

 

dm,n = f, 1 

√ (−1) 

2 

k 

k 

c1−kφm−1,2n+k 

= 1 

 

k √ f, (−1) c1−kφm−1,2n+k 

2 

k 

= 1 

√ (−1) 

2 

k c1−k 〈f, φm−1,2n+k〉 

= 1 

√ 2 

k 

 

k 

k 

(−1) k c1−ksm−1,2n+k 

(B.36) 

Thus the detail coefficients at level m can be derived from the approximation coefficients at level 

m − 1. 

The recursive relationships (B.34) and (B.36) are the forward part of the Fast Wavelet Transform 

(FWT). 

B.4.2. Inverse Fast Wavelet Transform. The reverse procedure is to derive the approximation 

coefficients at level m −1 from the approximation and details coefficients at level m. From (B.22), 

it can be seen that, 

Vm−1 = Vm ⊕ Wm 

(B.37) 

meaning that a function represented by a linear sum of functions in Φm−1 (the orthonormal basis 

of Vm−1) can be represented a sum of functions formed from the basis Φm and Ψm respectively: 

 

sm−1,nφm−1,n(t) = 

sm,kφm,k(t) + 

dm,kψm,k(t) 

n 

(substituting from (B.33) and (B.35)), 

k 

= 

k 

sm,k 

k 

1 

√ cjφm−1,2k+j(t) + 

2 

 

j 

76 

k 

dm,k 

1 

√ 

2 

j 

(−1) j c1−jφm+1,2k+j(t)


(substituting n = 2k + j), 

= 

k 

n 

sm,k 

1 

√ cn−2kφm−1,n(t) + 

2 

 

(changing the order of summation), 

= 

 

1 

√ cn−2ksm,k + 

2 

1 

√ 

2 

Equating coefficients of φm−1,k(t) in (B.38) gives, 

sm−1,n = 1 

√ cn−2ksm,k + 

2 

1 

√ 

2 

k 

k 

n 

k 

k 

k 

dm,k 

1 

√ 

2 

(−1) n−2k c 1−(n−2k)dm,k 

(−1) n−2k c 1−(n−2k)dm,k 

n 

(−1) n−2k c 1−(n−2k)φm−1,n(t) 

 

φm−1,n(t) 

(B.38) 

(B.39) 

B.4.3. Decomposition of L2 (R) by FWT. If the functions sm(t) and dm(t) are defined as the signals 

represented by the approximation and details coefficients at level m: 

∞ 

sm(t) = sm,nφm,n(t) (B.40) 

dm(t) = 

n=−∞ 

∞ 

n=−∞ 

dm,nψm,n(t) (B.41) 

then the decomposition of the original signal defined in (B.28) can also be represented as: 

∞ 

f(t) = sm ′(t) + dm(t) (B.42) 

m=m ′ 

where m ′ ∈ Z can be chosen as required. The signal sm ′ represents an approximation of f(t), and 

as m ′ increases the approximation becomes closer to f(t). Each of the signals dm(t) is composed 

of dilated wavelet functions whose mean frequency decreases as m increases. 

B.5. Pyramid Algorithm. The FWT is often used to process signals where time values are 

discrete. For example, in this project, the NMR data sets are discrete signals. When certain 

assumptions on the form of the wavelet and scaling functions are met, an efficient implementation 

of the FWT for discrete-time signals is possible, termed the Pyramid Algorithm[1, 18]. 

B.5.1. Input to Pyramid Algorithm. The discrete signal is denoted by f[tn] where the {tn} are the 

discrete time values. Although the signal is discrete, a continuous-time equivalent, f(t), can be 

constructed. For example, and assuming a constant interval of 1 between the time values, a step 

function can be created as: 

f(t) = 

 

f[tn] ∃ tn such that tn − 1 

2 ≤ t < tn + 1 

2 

0 otherwise 

(B.43) 

The starting point for the discrete time FWT are approximation coefficients at level m = 0 of 

the continuous-time function derived using (B.26). 17 (In some implementations of the pyramid 

algorithm, the discrete function values f[tn] are used instead of the approximation coefficients. 

This is incorrect, except where the wavelet function is the Haar wavelet in case the two sets of 

values are the same.) This results in the set of approximation coefficients {s0,n}. Note that, in 

general, {s0,n} represents a signal: 

s0(t) = 

2 M −1 

that is only an approximation for the original discrete signal. 

0 

s0,nφ0,n(t) (B.44) 

17 Level m = 0 corresponds to translations of the scaling function by the sampling rate, b0, which we have taken 

to be 1 above. For signals where the discrete time interval is not 1, the sampling rate can be modified accordingly. 

77


B.5.2. Signal Length and Padding. In this section, the signal is assumed to be finite and that 

its length is such that it is represented by 2 M (M ∈ N) level 0 approximation coefficients, i.e. 

n ∈ {0, 1, . . ., 2 M −1}. In practical applications, a shorter signal can be padded to give the required 

number of coefficients. Typical methods include zero-padding where 0 values are added to one 

or both ends of the signal, or symmetric-padding where the part signal is repeated, in reverse, at 

each of the original signal. The advantage of the latter method is that it avoids creating artificial 

discontinuities. 

B.5.3. Compact Support. Additionally, it is assumed that the wavelet has sequences of non-zero 

scaling coefficients which are finite in length, in which case it is said to possess compact support 

[1]. In particular, a single finite sequence of K coefficients is assumed, with all other coefficients 

being zero. This enables a redefinition of the equations (B.31) and (B.32), 

φ(t) = 

K−1 

k=0 

ckφ(2t − k) (B.45) 

K−1 

ψ(t) = (−1) k cK−1−kφ(2t − k) (B.46) 

k=0 

The modification ensures that the wavelet and scaling function are ‘supported’ over the same 

finite interval [0, K − 1] [1]. The corresponding modification to the equations for the forward 

FWT, (B.34) and (B.36), are: 

sm,n = 1 

K−1 

√ 

2 

k=0 

dm,n = 1 

K−1 

√ 

2 

k=0 

cksm−1,2n+k 

(−1) k cK−1−ksm−1,2n+k 

(B.47) 

(B.48) 

B.5.4. Wraparound. If the signal time window represented by the 2 M approximation coefficients 

{s0,n} is [0, T], a further simplification is to repeat the coefficients with period 2 M : 

s 0,n+2 M k = s0,n where k ∈ Z (B.49) 

and so allow the FWT equations above to refer to approximation coefficients with n indices outside 

the range 0 to 2 M − 1. This is equivalent to assuming the original discrete signal is periodic with 

period T. However, if the signal is not periodic with period T, this creates discontinuities at 0 and 

T which can lead to large detail coefficients at the boundaries [1]. 18 An alternative interpretation 

is that the wavelet and scaling functions ‘wraparound’ the time window by beginning again at 0 

once they reach T, giving this technique its name. 

B.5.5. Pyramid Algorithm. The FWT can then be used to construct the approximation and detail 

coefficients at level m = −1 from the set {s0,n} using equations (B.47) and (B.48). The spacing 

of the dyadic grid at level m = 1 is twice that of level m = 0, so only 2 M−1 approximation 

coefficients, i.e. half those of level m = 0 are required to give the approximation of the signal in 

the time window [0, T]. This can be seen from the form of the FWT equations (B.47) and (B.48): 

as the index n increases by 1 on the left-hand side, the corresponding indexes of the set of K 

approximation coefficients in the sum on the right-hand side moves by 2. Similarly, the FWT 

generates 2 M−1 detail coefficients to cover the time window [0, T]. 

By recursion, the FWT constructs the approximation and detail coefficients at levels m = 

1, 2, . . .,M. At level m = m ′ , 2M−m′ approximation coefficients and 2M−m′ detail coefficients are 

produced. So, at level m = M, the FWT generates a single approximation coefficient and the 

approximate signal s0(t) is represented by this 1 approximation coefficient and a total of 2M − 1 

detail coefficients derived at this and earlier levels. Further decomposition beyond level m = M 

18 To avoid the discontinuities, the signal can also be ‘mirrored’ at the boundaries—the signal is repeated in 

reverse—as an alternative to wraparound. 

78


is not possible. For many applications the algorithm may stopped before level M = m. 19 In 

general, at level m = m ′ (1 ≤ m ′ ≤ M), the approximate original signal, s0(t), is decomposed as: 

s0(t) = 

2 M−m′ −1 

n=0 

or following the format of (B.42), 

sm ′ ,nφm ′ ,n(t) + 

m ′ 

 

m=1 

m=1 

2 M−m −1 

n=0 


m 

s0(t) = sm ′(t) + 

′ 

 

dm(t) (B.51) 

Note that since each wavelet, and therefore each detail signal dm(t), has a mean of zero by (B.11), 

so the mean value of the original signal is carried only by sm ′(t). 

B.5.6. Signal Decomposition by the Pyramid Algorithm. This decomposition of a discrete time 

signal into approximation and detail coefficients performed by the Pyramid Algorithm described 

above and can be represented as follows: 

s0(t) 

s1,n ✲ s1(t) 

d1,n ✲ 

d1(t) 

s2,n ✲ s2(t) · · · 

d2,n ✲ 

d2(t) 

where sm ′(t) and dm ′(t) are the approximation and details signals at level m = m′ defined in 

(B.42), and sm ′ ,n and dm ′ ,n are the corresponding coefficients. 

Figure 41 shows the decomposition by the pyramid algorithm of the signal constructed in 

Figure 40 from two sinusoidal signals of different frequencies. (a) is the original signal, while (b) 

to (j) show the detail signals at levels 1 to 9. The detail signals, dm(t), are created by applying 

the inverse transform to only the detail coefficients at level m, all other approximation and detail 

coefficients being set to 0. It can be seen that the higher frequency signal component is represented 

by larger magnitude coefficient at its location in levels 1 to 4, shown in (b), (c) and (d) and (e). 

The lower frequency component is represented at higher decomposition levels 4 to 8, shown in (e), 

(f), (g) and (h). 

B.5.7. Linear Algebraic Representation. This process can also be represented using linear algebra. 

To illustrate the form of the matrices, an example of four non-zero scaling coefficients—c0, c1, c2, 

c3—is used. Let sm−1 be the column vector of length 2 M−m′ +1 consisting of the approximation 

coefficients produced at level m ′ − 1. Let Tm be a 2 M−m+1 × 2 M−m+1 matrix constructed from 

the scaling coefficients as follows: 

⎡ 

⎤ 

c0 c1 c2 c3 0 0 0 . . . 0 0 0 

⎢−c3 

c2 −c1 c0 ⎢ 

0 0 0 . . . 0 0 0 ⎥ 

⎢ 0 0 c0 c1 c2 c3 0 . . . 0 0 0 ⎥ 

⎢ 

Tm = ⎢ 0 0 −c3 c2 −c1 c0 0 . . . 0 0 0 ⎥ 

⎢ 

. 

⎢ . . . . . . . .. 

⎥ 

. . . ⎥ 

⎣ c2 c3 0 0 0 0 0 . . . 0 c0 c1⎦ 

· · · 

−c1 c0 0 0 0 0 0 . . . 0 −c3 c2 

(B.52) 

Note that the scaling coefficients shift by two columns after each pair of rows—the factor two being 

related to half the approximation coefficients generated at this level compared to the previous— 

and that the lowest rows of the matrix ‘wraparound’ to implement the wraparound within [0, T] 

of the wavelet and scaling functions in the manner described above. 

19 In addition, where the number of non-zero scaling coefficients K is large, it may not be practical to decompose 

entirely to level m = M. 

79


(a) 

(c) 

(e) 

(g) 

(i) 

s 0 (t) 

d 2 (t) 

d 4 (t) 

d 6 (t) 

d 8 (t) 

1 

0.5 

0 

−0.5 

−1 

0 100 200 300 400 500 600 700 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

(b) 

(d) 

(f) 

(h) 

(j) 

d 1 (t) 

d 3 (t) 

d 5 (t) 

d 7 (t) 

d 9 (t) 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0.6 

0.4 

0.2 

0 

−0.2 

−0.4 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

t 

0 100 200 300 400 500 600 700 

Figure 41. Decomposition of a signal, s0(t), plotted in (a), into detail signals 

d1(t) to d9(t), plotted in (b) to (j), by the pyramid algorithm using the Haar 

wavelet. 

With this construction, the equation, 

bm = Tmsm−1 

t 

(B.53) 

is the equivalent to a combination of the compact support form of the FWT, equations (B.47) 

and (B.48). The odd-indexed elements of resultant column vector bm are the approximation 

coefficients at level m, and the even-indexed elements are the detail coefficients. A similar linear 

algebra form exists for the inverse FWT. 

B.6. Wavelet Construction and Families. One method of constructing wavelet and scaling 

functions is to begin with the scaling coefficients defined by (B.31) [24]. 

B.6.1. Conditions on Scaling Coefficients. If equation (B.31) is integrated to give: 

∞ 

−∞ 

φ(t)dt = 

= 

∞ 

−∞ 

∞ 

−∞ 

 

ckφ(2t − k)dt 

k 

 

k 

80 

ckφ(t ′ − k) 1 

2 dt′


(by change of variable: t ′ = 2t) 

= 1 

2 

Since ∞ 

−∞ φ(t)dt is finite by (B.24), then, 

 

k 

∞ 

ck 

t ′ φ(t 

=−∞ 

′ − k)dt ′ 

(B.54) 

 

ck = 2 (B.55) 

k 

In addition, if the family of scaling functions dilated by the same degree m, Φm = {φm,n : n ∈ Z} 

are assumed to be an orthonormal basis of the subspace Vm, then 

 

1 if n = n 

〈φm,n, φm,n ′〉 = 

′ 

(B.56) 

0 otherwise 

Without loss of generality, assume n to be zero, then from equation (B.31), 

 

 

〈φm,0, φm,n〉 = 

(change of variables: t ′ = 2t) 

k 

k 

ckφm−1,k(2t) 

k ′ 

k ′ 

ck ′φm−1,k ′(2(t − n))dt 

 

 

= ckφm−1,k(t ′ ) 

ck ′φm−1,k ′(t′ − 2n) 1 

2 dt′ 

= 1 

2 

 

(using φm−1,k ′(t′ − 2n) = φm−1,k ′ +2n(t ′ )) 

= 1 

2 

(change of variables: k ′′ = k ′ + 2n) 

= 1 

2 

k 

k ′′ 

 

k 

k ′′ 

 

k 

k ′′ 

ckck ′ 

 

ckck ′′ −2n 

φm−1,k(t 

m 

′ )φm−1,k ′ +2n(t ′ )dt ′ 

 

φm−1,k(t 

m 

′ )φm−1,k ′′(t′ )dt ′ 

ckck ′′ −2n〈φm−1,k, φm−1,k ′′〉 (B.57) 

Since the scaling function family Φm−1 = {φm−1,n : n ∈ Z} are an orthonormal basis of Vm−1 

then: 

 

1 when k = k 

〈φm−1,k, φm−1,k ′′〉 = 

′′ 

0 otherwise 

(B.58) 

enabling the simplification, 

〈φm,0φm,n〉 = 1 

2 

Thus from (B.56), the condition is [24, 1, 18], 

 

ckck−2n = 

k 

 

k 

ckck−2n 

 

2 if n = 0 

0 otherwise 

(B.59) 

(B.60) 

A third desirable (but not mandatory) property is for the dilated scaling functions forming the 

subspaces Vm to approximate polynomials upto a chosen degree p as closely as possible. From 

[24], this condition is: 

 

k 

(−1) k k j ck = 0 for j = 0, 1, 2, . . ., p − 1 (B.61) 

81


This is equivalent to the wavelet function φ(t) having moments of 0 up to and including the 

(p − 1) th moment 20 , i.e.: 

 

t j φ(t)dt = 0 for j = 0, 1, 2, . . ., p − 1 (B.62) 

B.6.2. Daubechies Wavelet Family. If the number of non-zero coefficients, K is even, using the 

two mandatory conditions (B.55) and (B.60) combined with the ensuring the first K/2 moments 

are zero in (B.61), gives rise to a family of wavelets called Daubechies Wavelets [18]. Members of 

the family are often denoted DK where K is the number of non-zero scaling coefficients. 

The simplest Daubechies wavelet is D2 given by scaling coefficients: 

This wavelet is also called the Haar Wavelet. 

B.7. Denoising and Smoothing. 

c0 = 1 c1 = 1 (B.63) 

B.7.1. Denoising. If a full decomposition of the discrete-time signal (represented initially as 2 M 

approximation coefficients) is performed using the pyramid algorithm, the result is: 

s0(t) = sM,0φM,0(t) + 

M 

m=1 

2 M−m −1 

n=0 


If the signal contains noise, with a magnitude less than the original signal itself, then the 

signal will give rise to relatively large detail coefficients at particular dilations and translations 

corresponding to the time locations and frequencies of the signal, while the noise might be expected 

to produce relatively small detail coefficients at different location and frequencies. On this basis, 

the approach to denoising by wavelets is modify the detail coefficients to minimise the contribution 

resulting from the noise. 

If each wavelet coefficient is modified to produce d S m,n = dm,n−d N m,n (the superscript S denoting 

the signal, and N , the noise) then, 

s0(t) = sM,0φM,0(t) + 

= sM,0φM,0(t) + 

M 

m=1 

M 

m=1 

2 M−m −1 

n=0 

2 M−m −1 

n=0 

S 

dm,n + d N 

m,n ψm,n(t) 

d S m,n ψm,n(t) + 

M 

m=1 

2 M−m −1 

n=0 

d N m,n ψm,n(t) 

= s S 0 (t) + s N 0 (t) (B.65) 

where s S 0 (t) is the denoised signal, and sN 0 

s S 0 (t) = sM,0φM,0(t) + 

s N 0 

(t) = 

M 

m=1 

2 M−m −1 

n=0 

(t), the noise, i.e.: 

M 

m=1 

2 M−m −1 

n=0 

d S m,n ψm,n(t) (B.66) 

d N m,n ψm,n(t) (B.67) 

Thus, considering the decomposition (B.66), the signal can be reconstructed by performing the 

inverse pyramid algorithm using the amended detail coefficients, d S m,n. 

20 The equation for j = 0 is derivable from the two conditions (B.55) and (B.60)[18], and so does not create an 

additional constraint on the scaling coefficients. 

82


B.7.2. Thresholding. There are two major methods of amending the detail coefficients to give d S m,n , 

both depending on a threshold value. For generality, the threshold is assumed to be dependent on 

the level m, and is denoted Λm, but some denoising techniques use the same threshold across all 

levels [1]. 

In hard thresholding, all coefficients with a magnitude less than the threshold are set to zero (ideally 

these will correspond to the small noise contribution), while the remainder are left unchanged, 

i.e.: 

d S m,n = 

 


dm,n otherwise 

(B.68) 

The other method, soft thresholding, sets coefficients with a magnitude less than the threshold 

to zero, as for hard thresholding, but subtracts the threshold from the remaining coefficients: 

d S m,n = 


dm,n 

|dm,n| (|dm,n| − Λm) otherwise 

(B.69) 

The reasoning is that noise contributes not only to the small magnitude coefficients consisting of 

just noise, but also to the large magnitude coefficients containing the signal. 

One widely used method of deriving the threshold is to assume that the noise is Gaussian white 

noise. In this case, the expected maximum value in N detail coefficients is given by: 

Λ = (2 ln N) 1 

2σ (B.70) 

where σ is the standard deviation of the noise. At level m, there are 2(M − m) detail coefficients, 

thus, 

Λm = (2 ln 2 M−m ) 1 

2 σ 

= (2(M − m)ln 2) 1 

2 σ (B.71) 

This method is called the universal threshold. 

A significant application of wavelet denoising is image compression. Although the discrete 

signal, representing the image, is not necessarily ‘noisy’, the thresholding method removes small 

components in the signal corresponding to the smallest detail coefficients. While this has little 

effect on the visual appearance of the image, the resulting list of detail coefficients contains many 

zeros as a result of thresholding and so can be compressed significantly using standard algorithms. 

To reconstruct the image, the list of detail coefficients is uncompressed and the image reconstructed 

using the inverse pyramid algorithm. 

B.7.3. Smoothing. Smoothing is distinguished from denoising in that it considers the frequency (or, 

equivalently, scale) of the unwanted components of the signal, rather than amplitude represented 

by the magnitude of the detail coefficients. 

Starting with a decomposition to level m = m ′ , the signal is represented, from (B.50), as: 

s0(t) = 

 

2 M−m′ 

−1 

n=0 

sm ′ ,nφm ′ ,n(t) + 

m ′ 

 

m=1 

2 M−m −1 

n=0 


The detail coefficients from levels 1 to m relate to dilated wavelets ψm,n(t) with the highest 

mean frequencies, the mean frequency reducing by a factor of 2 as m increases by 1 (see section 

B.1.5). Setting these detail coefficients to zero and reconstructing the signal therefore produces 

an amended signal with the highest frequency components removed, i.e.: 

s S 2 

0 (t) = 

M−m′ −1 

n=0 

sm ′ ,nφm ′ ,n(t) (B.73) 

The removal of the high frequency components (at all time locations) results in a signal, s S 0 (t), 

that is ‘smoother’ than the original. The higher the level m = m ′ of decomposition, the more of 

the lower frequency components are removed, causing increased smoothing. 

83


B.8. Non-Decimating (Translation Invariant) Transform. At each stage in the pyramid 

algorithm, the number of details coefficients produced is half that of the previous level. This 

is particularly evident in the linear algebra representation of the transform in equation (B.52). 

For this reason, the transform is known as a decimating transform. However, a consequence of 

this is that, while the original signal can be reconstructed from the set of approximation and 

detail coefficients created by decomposition to a given level, as described by equation (B.50), the 

representation is not unique. 

Considering the a single level of decomposition, if the origin from which the translation of the 

wavelet and scaling functions is performed is moved by 1, a different set of coefficients are produced: 

in effect, the intermediate points in the dyadic grid are being used for the decomposition. If the 

origin is moved again by 1 in the same direction—a total change of 2 units—the original set of 

coefficients are produced again, albeit shifted in location, since the same dyadic grid is being used. 

Thus there are two dyadic grids on which the decomposition can be performed at a given level. 

An alternative technique is the non-decimating transform that is invariant under translation by 

discrete units. In this transform, the same number of detail coefficients are produced at each level 

of decomposition by considering both the original decomposition and the decomposition shifted 

by a discrete unit. If the linear algebra representation is used, the matrix for the non-decimating 

transform—the equivalent of (B.52)—can be represented by: 

⎡ 

⎤ 

c0 c1 c2 c3 0 0 0 . . . 0 0 0 

⎢−c3 

c2 −c1 c0 ⎢ 

0 0 0 . . . 0 0 0 ⎥ 

⎢ 0 c0 c1 c2 c3 0 0 . . . 0 0 0 ⎥ 

⎢ 0 −c3 c2 −c1 c0 0 0 . . . 0 0 0 ⎥ 

⎢ 0 0 c0 c1 c2 c3 0 . . . 0 0 0 ⎥ 

T ′ 

m = 

⎢ 

⎥ 

⎢ 0 0 −c3 c2 −c1 c0 0 . . . 0 0 0 ⎥ 

⎢ 

. 

⎢ . . . . . . . .. 

⎥ 

. . . ⎥ 

⎢ c2 c3 0 0 0 0 0 . . . 0 c0 c1 

⎥ 

⎢ 

⎢−c1 

c0 0 0 0 0 0 . . . 0 −c3 c2 

⎥ 

⎣ c1 c2 c3 0 0 0 0 . . . 0 0 c0 ⎦ 

c2 −c1 c0 0 0 0 0 . . . 0 0 −c3 

(B.74) 

Since the transform considers the two alternative dyadic grids, the total set of coefficients 

produced is the same for any unit translation and so the non-decimating transform is translation 

invariant. If a decomposition to m levels is now considered, it can be seen that there are 2 m 

choices for the dyadic grid on which this decomposition can be performed 21 . 

The non-decimating transform is of use in techniques such as denoising and smoothing. Since 

the set of coefficients varies with the choice of dyadic grid, the signal resulting from denoising 

and smoothing using the decimating transform differs depending on the grid choice. Using the 

non-decimating transform, the denoising or smoothing is applied to each set of coefficients (each 

set corresponding to one choice of grid) and the denoised or smoothed signal reconstructed from 

each set independently. The resulting set of 2 m signals is then averaged to produce the final signal. 

This final signal is therefore translation invariant: original signals shifted by a integer number of 

discrete units produce the same denoised or smoothed signal. 

In practical terms, the non-decimating denoising or smoothing technique minimises artefacts 

that can occur when certain features in the signal such as, rapid changes or discontinuities, align 

with particular parts of the wavelet in the decimating transform, or when the wavelet is not suitable 

for approximating the signal[20]. The non-decimating transform averages out such artefacts over 

the 2 m signals. 

A useful way of calculating the non-decimating transform to m levels is to perform a standard 

decimating pyramid algorithm on the signal 2 m , but shifting the signal by one discrete unit 

on each occasion. This can result in a significant increase in processing time for large level of 

signal. 

21 There is, of course, redundancy in that any of the 2 m sets coefficients could be used to reconstruct the original 

84


decomposition. However, artefacts may still be reduced by using a significantly smaller set subset 

of the 2 m possible shifts[20]. 

B.9. Two-Dimensional Discrete Wavelet Transforms. The techniques of discrete wavelet 

transforms can be applied to 2D signals by the application of 1D wavelet and scaling functions in 

each direction. 

If a discrete signal is represented as a function f(x, y) (the variables x and y being the equivalent 

of t in the 1D case), then 2D equivalent of s0(t), the starting point for the decomposition given in 

(B.44), is given by representing the 2D signal as the tensor product of 1D scaling functions: 

s0(x, y) = 

s0,(nx,ny)φ0,nx(x)φ0,ny(y) (B.75) 

nx 

ny 

If s 0,(nx,ny)φ0,nx is decomposed into φ1,nx and ψ1,nx, then using equations (B.47) and (B.48) 

for compactly supported wavelets, 

s0(x, y) = K−1 

1 

√ 

2 

nx 

ny 

nx 

ny 

kx=0 

kx=0 

ckxs 0,(2nx+kx,ny)φ1,nx(x)(−1) kx cK−1−kxs 0,(2nx+kx,ny)ψ1,nx(x) φ0,ny(y) 

= 1 K−1 

√ ckxφ1,nx(x) + (−1) 

2 

kx cK−1−kxψ1,nx(x) s0,(2nx+kx,ny)φ0,ny(y) Similarly decomposing s 0,(2nx+kx,ny)φ0,ny(y) gives, 

s0(x, y) = 1 √ 2 

= 1 

2 

K−1 

φ1,nx(x) + (−1) kx cK−1−kxψ1,nx(x) 

nx 

· 1 

√ 2 

nx 

ny 

K−1 

ky=0 

ny 

kx=0 

K−1 

nx 

(B.76) 

ckys 0,(2nx+kx,2ny+ky)φ1,ny(y) + (−1) ky cK−1−ks 0,(2nx+kx,2ny+ky)ψ1,ny(y) 

K−1 

kx=0 ky=0 

ckxckys 0,(2nx+kx,2ny+ky)φ1,nx(x)φ1,ny(y) 

+ (−1) kx cK−1−kxckys 0,(2nx+kx,2ny+ky)ψ1,nx(x)φ1,ny(y) 

+ ckx(−1) ky cK−1−ks 0,(2nx+kx,2ny+ky)φ1,nx(x)ψ1,ny(y) 

+ (−1) kx cK−1−kx(−1) ky cK−1−ks 0,(2nx+kx,2ny+ky)ψ1,nx(x)ψ1,ny(y) 

= 

s1,(nx,ny)φ1,nx(x)φ1,ny(y) + d (x) 

1,(nx,ny) ψ1,nx(x)φ1,ny(y) 

ny 

+ d (y) 

1,(nx,ny) φ1,nx(x)ψ1,ny(y) + d (xy) 

 

1,(nx,ny) ψ1,nx(x)ψ1,ny(y) 

85 

(B.77)


where 

s 1,(nx,ny) = 

K−1 

K−1 

kx=0 ky=0 

d (x) 

1,(nx,ny) = 

K−1 

kx=0 ky=0 

d (y) 

1,(nx,ny) = 

K−1 

ckxckys 0,(2nx+kx,2ny+ky) 

K−1 

(−1) kx cK−1−kxckys 0,(2nx+kx,2ny+ky) 

K−1 

(−1) ky ckxcK−1−ks0,(2nx+kx,2ny+ky) kx=0 ky=0 

d (xy) 

1,(nx,ny) = 

K−1 

 

K−1 

(−1) kx+ky cK−1−kxcK−1−ks 0,(2nx+kx,2ny+ky) 

kx=0 ky=0 

(B.78) 

(B.79) 

(B.80) 

(B.81) 

Equations (B.78) to (B.81) represent a single level decomposition in two dimensions. The 

summary coefficients, s1,(nx,ny), relate to tensor products of scaling functions at the scale of 

the first decomposition level, and so the process can be applied iteratively to further levels of 

decomposition. Three sets of detail coefficients are produced at each level: d (x) 

m,(nx,ny) , d(y) 

m,(nx,ny) , 

and d (xy) 

m,(nx,ny) , relating to the remaining combinations of tensor products between the scaling and 

wavelets functions. 

Equations (B.76) and (B.77) represent a 1D decomposition along the x direction followed by an 

equivalent 1D decomposition of the resulting coefficients along the y direction, and is equivalent to 

how the coefficients are derived in practice. Note that equations (B.78) to (B.81) are symmetrical 

in x and y and so the decomposition can also be applied in the y direction followed by the x 

direction. 

Wavelet techniques, such as denoising and smoothing described above, can be applied to the 

results of the 2D decomposition with appropriate modifications. For example, the thresholding 

of detail coefficients may be applied separately to each of the three sets of detail coefficients. 

Similarly, the non-decimating transform requires shifts of the grid in both directions, resulting in 

22m shifts for a decomposition to m levels. 

86


Appendix C. Genetic Algorithm Overview 

This appendix provides an overview of evolutionary algorithms, and genetic algorithms in particular. 

C.1. Evolutionary Algorithms. Evolutionary algorithms (EAs) are a stochastic search technique 

used to find solutions to optimisation problems. They are inspired by the biological process 

of natural selection from which they borrow both some key principles as well as terminology. 

Typically EAs operate on a set of solutions simultaneously that constitute a ‘population’. The 

population is subject to ‘evolutionary pressure’, often in the form of selection of the best ‘individuals’ 

(solutions) in the population to form some or all of the next generation of the population. In 

addition, EAs introduce stochastic variation through two major processes: random mutation of 

individuals and crossover between individuals to produce new members of the population (similar 

in some respects to biological reproduction). In general, mutation acts to explore new regions of 

the search space, while crossover explores within the region of search space that the population 

currently occupies. Meanwhile, the evolutionary pressure of selection encourages the population 

to move towards optimum solutions. 

C.2. Steady-State Genetic Algorithms. Genetic Algorithms (GAs) are a widely used type of 

evolutionary algorithm developed in the 1970’s by John Holland[2]. This project uses a variant of 

GA termed Steady-State or Overlapping, the flowchart of which is shown in Figure 42. 

Initialise Population: The population is usually initialised to a random set of solutions. 

Evaluate Fitness (1): The fitness of each individual is measured using an objective function 

(see below). 

Terminate?: The population is tested and the algorithm is terminated if a specific condition 

is met. Typically this is a requirement that one or more individuals in the population 

are sufficiently close to an optimum solution, as measured by their fitness. In practical 

Figure 42. Flowchart of the Steady-State Genetic Algorithm 

87


implementations, the algorithm will also be configured to terminate after a set of number 

of generations. 

Mutation & Crossover: The individuals in the population are subject to the stochastic 

processes of mutation (acting on each individual independently) and crossover (usually 

acting on two individuals). The resulting solutions are potential members of the next 

generation. 

Evaluate Fitness (2): The fitness of the new individuals created by mutation and crossover 

are evaluated. 

Derive Next Generation: The members of the next generation are selected from the current 

generation and the new solutions, usually by picking the ‘best’ individuals. In Steady- 

State GAs, the selection is constrained so that only some of the current generation is 

replaced by child solutions. The process continues for the next generation from the termination 

step. 

C.3. Representation. GAs makes a distinction between the genotype, the representation of the 

solutions, and the phenotype, the solution themselves. Traditionally, GAs use a binary string 

representation for the genome, although this project uses a more complex genotype consisting of 

a variable number of real-valued parameters (see section 3.7.1). 

C.4. Operators. Operators act to initialise, mutate and crossover individuals. Traditional GA 

implementations utilise a set of mutation and crossover operators suitable for binary strings. However, 

in this project, more complicated custom operators are used owing to the specific genotype 

used (see section 3.7). 

C.5. Objective Function. The objective (or fitness) function is a measure of how good a solution 

a particular individual represents. An optimal solution corresponds to the maximum (or minimum) 

of this function. This project uses a deterministic function described in section 3.7.5, but is also 

possible to evaluate fitness by pitting one solution against another in a ‘tournament’ to find the 

better solution[2]. 

88


Appendix D. Experimental Methods 

Sample Solvent: 100% D2O. 

Sample Temperature: 300K. 

Reference Compound: Internal standard of 1 mM 3-(trimethylsilyl)-propionic acid-d4, 

sodium salt (TSP) (0 ppm for both F1 and F2). 

Spectrometer: Bruker ARX 500 NMR spectrometer tuned to 1 H signal at 500.13 MHz 

( 13 C at 125.76 MHz). 

2D Phase Program: 

• For sucrose/glycine mixtures, and the sucrose only spectra, used to illustrate denoising 

algorithm and peak picking (sections 2 and 3); and for the pea leaf metabolic 

spectra: non-gradient phase selective 1 H– 13 C HSQC correlation via double INEPT 

transfer. 

• For neutral and acidic sucrose reference spectra used for adaptive binning (section 5); 

and for glucose used as an example 2D spectra (appendix A): gradient enhanced 

phase selective 1 H– 13 C HSQC correlation via double INEPT transfer. 

• For all spectra: 

– 90 ◦ pulse lengths: 9.2 µs for 1 H; 16.5 µs for 13 C. 

– carbon coupling constant: 145 Hz. 

– acquisition was recorded with decoupling of 13 C via composite pulse decoupling 

(CPD) with a garp sequence. 

Spectral Width: 

• neutral/acidic sucrose and pea: F2 6.666 kHz; F1 22.64 kHz. 

• sucrose/glycine mixture: F2 6.666 kHz; F1 20.12 kHz. 

Acquisition Data Points: 

• pea F2 2048; F1 448. 

• neutral/acidic sucrose: F2 1536; F1 480. 

• sucrose/glycine mixtures: F2 1536; F1 384. 

Window Function: squared-cosine (see section A.6.3). 

Baseline Correction: automatic (‘quad’ mode). 

Post-Fourier Transform Data Points: F2 2048; F1 1024 (complex data points in both 

directions). 

Phase Correction: manual. 

89


Appendix E. Code Structure 

This appendix describes the structure of the code used for the combined denoising and peak 

picking code (sections 2, 3 and 4), and for two-dimensional adaptive binning (section 5). 

The code is implemented as matlab m-file scripts and functions, and as MEX files written 

in C ++. The source code itself, including additional code not described here, is provided on an 

accompanying CD. All m-files and MEX files referenced below, and those on the CD, were written 

specifically for this project. 

E.1. Denoising and Peak Picking. The denoising and peak picking is implemented as an m-file 

script, process spectrum.m. The major m-file functions and MEX files called by the script are 

shown in Figure 43 and described below. 

load2dproc: MEX file that loads the real and complex spectra directly from Bruker Topspin 

data files, returning both as matrices. In addition, it retrieves specific Bruker Topspin 

process parameters such as the ppm range covered by the F1 and F2 dimensions. 

separatenoise.m: Separates the t1-noise from the spectrum, returning the noise and remaining 

(large) peaks as separate matrices. separatenoisecmplx.m performs the separation 

on the complex spectrum using direct signal thresholding (section 2.5.4). separatenoise1D.m 

implements the separation of a single 1D trace. 

maskt1noise.m: Masks the t1-noise using the algorithm described in section 2.6. 

derivecomplexmask.m is used for the standard algorithm, while 

Figure 43. Structure of denoising and peak picking code. 

90


derivecomplexmaskwavelet.m is used for the alternative wavelet-level mask derivation. 

(The latter uses functions from the matlab Wavelet Toolbox.) 

pickpeaks.m: Finds the peaks in the denoised spectrum and identifies both the full and 

thresholded watershed for each peak (see section 3.6 for an explanation of the latter). 

Note that this function simply returns a list of peaks in the spectrum—identified as local 

maxima—rather than performing the more elaborate peak fitting of the genetic algorithm. 

identifypeakregions.m: Identifies regions of convoluted peaks using the thresholded peak 

watersheds. 

fitpeakregions.m: Fits each region of convoluted peaks to a theoretical peak shape. 

calcreferencepeak.m is called once per run of the process spectrum.m script to derive 

a representation of the theoretical peak that enables efficient calculation: see section 3.8.4. 

For each region, the MEX file peakfitga is called to run the genetic algorithm. This code 

leverages the C ++ GAlib library[25]. To evaluate the fit metric described in section 3.3, 

the MEX file calls back to the matlab function peakfitmetric.m. 

choosepeak.m: Picks peaks according to fit to theoretical peaks and/or peak signal-to-noise 

ratio. 

integratepeaks.m: Derives the peak intensity by integrating across the peak’s watershed. 

derivecleanspct.m: Removes all noise artefacts (including unpicked peaks) from the spectrum 

in preparation for adaptive binning. 

save2dproc: MEX file that optionally saves the denoised spectrum in Bruker Topspin data 

file format. 

E.2. Two-Dimensional Adaptive Binning. Adaptive binning is implemented using the script 

adaptive binning.m. Functions from the matlab Wavelet Toolkit are used to perform wavelet 

smoothing (section 5.3.3). The function derivepeakthreshwatershed.m, shared with the peak 

picking code described above, is used to robustly identify bins using watersheds: using a threshold 

avoids bin regions that incorrectly incorporate empty (zero intensity) parts of the spectrum. 

91


References 

[1] Paul S Addison, The illustrated wavelet transform handbook, Institute of Physics Publishing, 2002. 

[2] Peter J Bentley, An introduction to evolutionary design by computers, 1st ed., ch. 1, Morgan Kaufmann, 1999. 

[3] Joanne T Brindle, Jeremy K Nicholson, Peter M Schofield, David J Grainger, and Elaine Holmes, Application 

of chemometrics to 1 H NMR spectroscopic data to investigate a relationship between human serum metabolic 

profiles and hypertension, Analyst 128 (2003), 32–36. 

[4] Caroline Brissac, Thérèse E Malliavin, and Marc A Delsuc, Use of the Cadzow procedure in 2D NMR for the 

reduction of t1 noise, Journal of Biomolecular NMR 6 (1995), 361–365. 

[5] Charles Chiu, An introduction to wavelets, Academic Press, 1992. 

[6] Richard A Davis, Adrian J Charlton, John Godward, Mark Harrison, and Julie C Wilson, Adaptive binning: 

An improved binning method for metabolomics data using the undecimated wavelet transform, submitted to 

J. Chemom. Intell. Lab. Syst. as of August 2006. 

[7] Richard A Davis, Adrian J Charlton, Oehlschlager Sarah, and Julie C Wilson, Novel feature selection for 

genetic programming using metabolomic 1 H NMR data, J. Chemom. Intell. Lab. Syst. 81 (2006), 50–59. 

[8] A P De Weijer, C B Lucasius, L Buydens, and G Kateman, Curve fitting using natural computation, Anal. 

Chem. 66 (1995), 23–31. 

[9] Andrew E Derome, Modern NMR techniques for chemistry research, Pergamon Press, 1987. 

[10] Oliver Fiehn, Metabolomics - the link between genotypes and phenotypes, Plant Molecular Biology 48 (2002), 

155–171. 

[11] Leigh-Anne Fraser, Dulcie A Mulholland, and David D Fraser, Classification of limonoids and protolimonoids 

using neural networks, Phytochem. Anal. 8 (1997), 301–311. 

[12] Daniel S Garrett, Robert Powers, Angela M Gronenborn, and G Marius Clore, A common sense approach 

to peak picking in two-, three-, and four-dimensional spectra using automatic computer analysis of contour 

diagrams, J. Magn. Reson. 95 (1991), 214–220. 

[13] Andrew Gibbs, Gareth A Morris, Alistair G Swanson, and D Cowburm, Suppression of t1 noise in 2D NMR 

spectroscopy by reference deconvolution, J. Magn. Reson. 101 (1993), 351–356. 

[14] P J Hore, Nuclear magnetic resonance, Oxford Chemistry Primers, no. 32, Oxford University Press, 1995. 

[15] Norman Lloyd Johnson, Samuel Kotz, and N Balakrishnan, Continuous univariate distributions, 2nd ed., 

Wiley Series in Probability and Mathematical Statistics, vol. 2, John Wiley & Sons, 1994. 

[16] The MathWorks, Inc., matlab documentation, version 7.0.4.352 (R14 Service Pack 2) ed., Jan 2005, accessed 

as help file documentation. 

[17] A F Mehlkopf, D Korbee, Tiggelman T A, and Ray Freeman, Sources of t1 noise in two-dimensional NMR, 

J. Magn. Reson. 58 (1984), 315–323. 

[18] D E Newland, Random vibrations, spectral and wavelet analysis, 3rd ed., Addison Wesley Longman Limited, 

1993. 

[19] Stephen G Oliver, Michael K Winson, Douglas B Kell, and Frank Baganz, Systematic functional analysis of 

the yeast genome, Tibtech 16 (1998), 373–377. 

[20] Catherine Perrin, Beata Walczak, and Désiré Luc Massart, The use of wavelets for signal denoising in capillary 

electrophoresis, Anal. Chem. 73 (2001), 4903–4917. 

[21] William F Reynolds and Raul G Enriquez, Gradient-selected versus phase-cycled HMBC and HSQC: pros and 

cons, Magn. Reson. Chem. 39 (2001), 531–538. 

[22] , Choosing the best pulse sequences, acquisition parameters, postacquisition processing strategies, and 

probes for natural product elucidation by NMR spectroscopy, J. Nat. Prod. 65 (2002), 221–244. 

[23] Shie Qian, Introduction to time-frequency and wavelet transforms, Prentice Hall PTR, 2002. 

[24] Gilbert Strang, Wavelets and dilation equations: A brief introduction, SIAM Review 31 (1989), no. 4, 614–627. 

[25] Matthew Wall, GAlib documentation, Massachusetts Institute of Technology, 2.4 ed., 1996, accessed online 

(July 2006) and as help file documentation. 

92

Denoising and Analysis of 2D NMR Spectra for Metabolomic ...

Create successful ePaper yourself

Delete template?

Save as template?