Dien PCA chapter.pdf
Dien PCA chapter.pdf
Dien PCA chapter.pdf
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Introduction to Principal Components Analysis of Event-Related<br />
To appear in:<br />
Potentials<br />
Joseph <strong>Dien</strong> 13 and Gwen A. Frishkoff 2<br />
1 Department of Psychology, Tulane University<br />
2 Department of Psychology, University of Oregon<br />
3 Department of Psychology, University of Kansas<br />
Event Related Potentials: A Methods Handbook. Handy, T. (editor).<br />
Cambridge, Mass: MIT Press.<br />
Address for correspondence: Joseph <strong>Dien</strong>, Department of Psychology,<br />
426 Fraser Hall, University of Kansas, 1415 Jayhawk Blvd., Lawrence,<br />
KS 66045-7556. E-mail: jdien@ku.edu.<br />
1
Introduction<br />
Over the last several decades, a variety of methods have<br />
been developed for statistical decomposition of event-related<br />
potentials (ERPs). The simplest and most widely applied of these<br />
techniques is principal components analysis (<strong>PCA</strong>). It belongs to<br />
a class of factor-analytic procedures, which use eigenvalue<br />
decomposition to extract linear combinations of variables<br />
(latent factors) in such a way as to account for patterns of<br />
covariance in the data parsimoniously, that is, with the fewest<br />
factors.<br />
In ERP data, the variables are the microvolt readings<br />
either at consecutive time points (temporal <strong>PCA</strong>) or at each<br />
electrode (spatial <strong>PCA</strong>). The major source of covariance is<br />
assumed to be the ERP components, characteristic features of the<br />
waveform that are spread across multiple time points and<br />
multiple electrodes (Donchin & Coles, 1991). Ideally, each<br />
latent factor corresponds to a separate ERP component, providing<br />
a statistical decomposition of the brain electrical patterns<br />
that are superposed in the scalp-recorded data.<br />
<strong>PCA</strong> has a range of applications for ERP analysis. First, it<br />
can be used for data reduction and cleaning or filtering, prior<br />
2
to data analysis. By reducing hundreds of variables to a handful<br />
of latent factors, <strong>PCA</strong> can greatly simplify analysis and<br />
description of complex data. Moreover, the factors retained for<br />
further analysis are considered more likely to represent pure<br />
signal (i.e., brain activity), as opposed to noise (i.e.,<br />
artifacts or background EEG).<br />
Second, <strong>PCA</strong> can be used in data exploration as a way to<br />
detect and summarize features that might otherwise escape visual<br />
inspection. This is particularly useful when ERPs are measured<br />
over many tens or hundreds of recording sites; spatial patterns<br />
can then be used to constrain the decomposition into latent<br />
temporal patterns, as described in the following section.<br />
The use of such high-density ERPs (recordings at 50 or more<br />
electrodes) has become increasingly popular in the last several<br />
years. A striking feature of high-density ERPs is that the<br />
complexity of the data seems to grow exponentially as the number<br />
of recording sites is doubled or tripled. Thus, while increases<br />
in spatial resolution can lead to important new discoveries,<br />
subtle patterns are likely to be missed, as higher spatial<br />
sampling reveals more and more complex patterns, overlapping in<br />
both time and space. A rational approach to data decomposition<br />
can improve the chances of detecting these subtler effects.<br />
3
Third, <strong>PCA</strong> can serve as an effective means of data<br />
description. In principle, <strong>PCA</strong> can describe features of the<br />
dataset more objectively and more precisely than is possible<br />
with the unaided eye. Such increased precision could be<br />
especially helpful when <strong>PCA</strong> is used as a preprocessing step for<br />
ERP source localization (<strong>Dien</strong>, 1999; <strong>Dien</strong>, Frishkoff, Cerbonne,<br />
& Tucker, 2003a; <strong>Dien</strong>, Spencer, & Donchin, 2003b; <strong>Dien</strong>, Tucker,<br />
Potts, & Hartry, 1997).<br />
Despite the many useful functions of <strong>PCA</strong>, this method has<br />
had a somewhat checkered history in ERP research, beginning in<br />
the 1960s (Donchin, 1966; Ruchkin, Villegas, & John, 1964). An<br />
influential review paper by Donchin & Heffley (1979) promoted<br />
the use of <strong>PCA</strong> for ERP component analysis. A few years later,<br />
however, <strong>PCA</strong> entered something of a dark age in the ERP field<br />
with the publication of a methodological critique (Wood &<br />
McCarthy, 1984), which demonstrated that <strong>PCA</strong> solutions may be<br />
subject to misallocation of variance across the latent factors.<br />
Wood and McCarthy noted that the same problems arise in the use<br />
of other techniques, such as reaction time and peak amplitude<br />
measures. The difference is that misallocation is made more<br />
explicit in <strong>PCA</strong>, which they argued should be regarded as an<br />
advantage. Yet this last point was often overlooked, and this<br />
seminal paper has, ironically, been cited as an argument against<br />
4
the use of <strong>PCA</strong>. Perhaps as a consequence, many researchers<br />
continued to rely on conventional ERP analysis techniques.<br />
More recently, the emergence of high-density ERPs has<br />
revived the interest in <strong>PCA</strong> as a method of data reduction.<br />
Moreover, some recent studies have shown that statistical<br />
decomposition can lead to novel insights into well-known ERP<br />
effects, providing evidence to help separate ERP components<br />
associated with different perceptual and cognitive operations<br />
(<strong>Dien</strong> et al., 2003a; <strong>Dien</strong>, Frishkoff, & Tucker, 2000; <strong>Dien</strong> et<br />
al., 1997; Spencer, <strong>Dien</strong>, & Donchin, 2001).<br />
The present review presents a systematic outline of the steps<br />
in temporal <strong>PCA</strong>, and the issues that arise at each step in<br />
implementation. Some problems and limitations of temporal <strong>PCA</strong><br />
are discussed, including rotational indeterminacy, problems of<br />
misallocation, and latency jitter. We then compare some recent<br />
alternatives to temporal <strong>PCA</strong>, namely: spatial <strong>PCA</strong> (<strong>Dien</strong>, 1998a),<br />
sequential (spatio-temporal or temporo-spatial) <strong>PCA</strong> (Spencer et<br />
al., 2001), parametric <strong>PCA</strong> (<strong>Dien</strong> et al., 2003a), multi-mode <strong>PCA</strong><br />
(Möcks, 1988), and partial least squares (PLS) (Lobaugh, West, &<br />
McIntosh, 2001). Each technique has evolved to address certain<br />
weaknesses with the traditional <strong>PCA</strong> method. We conclude with<br />
questions for further research, and advocate a research program<br />
5
for systematic comparison of the strengths and limitations of<br />
different multivariate techniques in ERP research.<br />
Steps in Temporal <strong>PCA</strong><br />
The two most common types of factor analysis are principal<br />
axis factors and principal components analysis. These methods<br />
are equivalent for all practical purposes when there are many<br />
variables and when the variables are highly correlated (Gorsuch,<br />
1983), as in most ERP datasets. In the ERP literature <strong>PCA</strong> is<br />
the more common method. Normally, one and the same term,<br />
"component," has been used for both <strong>PCA</strong> linear combinations and<br />
for characteristic spatial and temporal features of the ERP. To<br />
avoid confusion, the term factor (or latent factor) will be used<br />
here to refer to <strong>PCA</strong> (latent) components, and the term component<br />
will be reserved for spatiotemporal features of the ERP<br />
waveform.<br />
The <strong>PCA</strong> process consists of three main steps: computation<br />
of the relationship matrix, extraction and retention of the<br />
factors, and rotation to simple structure. In the following<br />
sections, <strong>PCA</strong> simulation is performed to illustrate each step,<br />
using the <strong>PCA</strong> Toolkit (version 1.06), a set of Matlab functions<br />
for performing <strong>PCA</strong> on ERP data. This toolkit was written by the<br />
first author and is freely available upon request.<br />
6
The Data Matrix<br />
A key to understanding <strong>PCA</strong> procedures as applied to ERPs is<br />
to be clear about how multiple sources of variance contribute to<br />
the data decomposition. In temporal <strong>PCA</strong>, the dataset is<br />
organized with the variables corresponding to time points, and<br />
observations corresponding to the different waveforms in the<br />
dataset, as shown in Figure 1.<br />
Figure 1. Data matrix, with dimensions 20 x 6. Variables are<br />
time points, measured in two conditions for ten subjects. Part<br />
b presents the covariance matrix computed from the data matrix.<br />
7
The waveforms vary across subjects, electrodes, and experimental<br />
conditions. Thus, subject, spatial, and task variance are<br />
collectively responsible for covariance among the temporal<br />
variables. Although it may seem odd to commingle these three<br />
sources of variance, they provide equally valid bases for<br />
distinguishing an ERP component; in this respect, is reasonable<br />
to treat them collectively. For example, the voltage readings<br />
tend to rise and fall together between 250 and 350 ms in a<br />
simple oddball experiment, because they are mutually influenced<br />
by the P300 which occurs during this period. Since individual<br />
differences, scalp location, and experimental task may all<br />
affect the recorded P300 amplitude, the amplitudes of these time<br />
points will likewise covary as a function of these three sources<br />
of variance. Figure 2 shows the grand-averaged waveforms (for<br />
n=10 subjects) corresponding to the simulated data in Figure 1.<br />
For simplicity, this example involves only one electrode site,<br />
ten subjects, and two experimental conditions. In subsequent<br />
sections, we will use these simulated data to help illustrate<br />
the steps in implementation of <strong>PCA</strong> for ERP analysis.<br />
8
Figure 2. Waveforms for grand-averaged data (n=10),<br />
corresponding to data in Figure 1. Graph displays two non-<br />
overlapping correlated components, plotted for two hypothetical<br />
conditions, A and B.<br />
The Relationship Matrix<br />
The first step in applying <strong>PCA</strong> is to generate a<br />
relationship (or association) matrix, which captures the<br />
interrelationships between temporal variables. The simplest<br />
such matrix is the sum-of-squares cross-products (SSCP) matrix.<br />
For each pair of variables, the two values for each observation<br />
are multiplied and then added together. Thus, variables that<br />
tend to rise and fall together will produce the highest values<br />
in the matrix. The diagonal of the matrix (the relationship of<br />
each variable to itself) is the sum of the squared values of<br />
each variable. For an example of the effect of using the SSCP<br />
9
matrix, see (Curry, Cooper, McCallum, Pocock, Papakostopoulos,<br />
Skidmore, & Newton, 1983). SSCP treats mean differences in the<br />
same fashion as differences in variance, which has odd effects<br />
on the <strong>PCA</strong> computations. For example, factors computed on an<br />
SSCP matrix can be correlated, even when they are orthogonal<br />
when using other matrices. In general, we do not recommend the<br />
use of the SSCP matrix in ERP analyses.<br />
An alternative to the SSCP matrix is the covariance matrix.<br />
This matrix is computed in the same fashion as the SSCP matrix,<br />
except that the mean of each variable is subtracted out before<br />
generating the relationship matrix. Mean correction ensures that<br />
variables with high mean values do not have a disproportionate<br />
effect on the factor solution. The effect of mean correction on<br />
the solution depends on the EEG reference site, a topic that is<br />
beyond the scope of this review (cf. <strong>Dien</strong>, 1998a).<br />
A third alternative is to use the correlation matrix as the<br />
relationship matrix. The correlation matrix is computed in the<br />
same fashion as the covariance matrix, except that the variable<br />
variances are standardized. This is accomplished by first mean<br />
correcting each variable, and then dividing each variable by its<br />
standard deviation, which ensures that the variables contribute<br />
equally to the factor solution. Since time points that do not<br />
10
contain ERP components have smaller variances, this procedure<br />
may exacerbate the influence of background noise. Simulation<br />
studies indicate that covariance matrices can yield more<br />
accurate results (<strong>Dien</strong>, Beal, & Berg, submitted). We therefore<br />
recommend the use of covariance matrices.<br />
In Figure 3(a), the simulated data are converted into a<br />
covariance matrix. Observe how the time points containing the<br />
two components (t2 and t5) result in larger entries than those<br />
without. The entries with the largest numbers will have the<br />
most influence on the next step in the <strong>PCA</strong> procedure: factor<br />
extraction.<br />
Factor Extraction<br />
In the extraction stage, a process called eigenvalue<br />
decomposition is performed, which progressively removes linear<br />
combinations of variables that account for the greatest variance<br />
at each step. Each linear combination constitutes a latent<br />
factor. In Figure 3, we demonstrate how this process<br />
iteratively reduces the remaining values in the relationship<br />
matrix to zero.<br />
11
Figure 3. (a) Original covariance matrix. (b) Covariance matrix<br />
after subtraction of Factor 1 (Varimax-rotated; see next<br />
Section). (c) Covariance matrix after subtraction of Factors 1<br />
and 2. Since Factors 1 and 2 together account for nearly all of<br />
the variance, the result is the null matrix.<br />
In general, <strong>PCA</strong> should extract as many factors as there are<br />
variables, as long as the number of observations is equal to or<br />
12
greater than the number of variables (i.e., as long as the data<br />
matrix is of full rank).<br />
The initial extraction yields an unrotated solution,<br />
consisting of a factor loading matrix and a factor score matrix.<br />
The factor loading matrix represents correlations between the<br />
variables and the factor scores. The factor score matrix<br />
indexes the magnitude of the factors for each of the<br />
observations and thus represents the relationship between the<br />
factors and the observations. If the two matrices are<br />
multiplied together, they will reproduce the data matrix. By<br />
convention, the reproduced data matrix will be in standardized<br />
form, regardless of the type of relationship matrix that was<br />
entered into the <strong>PCA</strong>. To recreate the original data matrix, the<br />
variables of this standardized matrix are multiplied by the<br />
original standard deviations, and the original variable means<br />
are restored.<br />
13
Figure 4. Reconstructed waveforms, calculated by multiplying the<br />
Factor Loadings by the Factor Scores, scaled to microvolts<br />
(i.e., multiplied by the matrix of standard deviations for the<br />
original data). Data reconstructed using Varimax-rotated factors<br />
for this example.<br />
For a temporal <strong>PCA</strong>, the loadings describe the time course<br />
of each of the factors. To accurately represent the time course<br />
of the factors, it is necessary to first multiply them by the<br />
variable standard deviations, which rescales them to microvolts<br />
14
(see proof in <strong>Dien</strong>, 1998a). Further, it is important to note<br />
that the sign of a given factor loading is arbitrary. This is<br />
necessarily the case since a given peak in the factor time<br />
course will be positive on one side of the head, and negative on<br />
the other side, due to the dipolar nature of electrical fields<br />
(Nunez, 1981). Note, further, that the dipolar distributions can<br />
be distorted or obscured by referencing biases in the data<br />
(<strong>Dien</strong>, 1998b). Only the product of the factor loading and the<br />
factor score corresponds to the original data in an unequivocal<br />
way. Thus, if the factor loading is positive at the peak, then<br />
the factor scores from one side of the head will be positive and<br />
the other side will be negative, corresponding to the dipolar<br />
field.<br />
The factor scores, on the other hand, provide information<br />
about the other sources of variance (i.e., subject, task, and<br />
spatial variance). For example, to compute the amplitude of a<br />
factor at a specific electrode site for a given task condition,<br />
one simply takes the factor scores corresponding to the<br />
observations for that task at the electrode of interest and<br />
computes their mean (across subjects). If this mean value is<br />
computed for each electrode, the resulting values can be used to<br />
plot the scalp topography for that factor. If a specific time<br />
point is chosen, it is possible reconstruct the scalp topography<br />
15
with the proper microvolt scaling by multiplying the mean scores<br />
by the factor loading and the standard deviation for the time<br />
point of interest (see proof in <strong>Dien</strong>, 1998a).<br />
Unlike the <strong>PCA</strong> algorithm in most statistics packages, the<br />
<strong>PCA</strong> Toolkit does not mean-correct the factor scores. This<br />
maintains an interpretable relationship between the factor<br />
scores and the original data. If factor scores are mean-<br />
corrected as part of the standardization, the mean task scores<br />
will be centered around zero, which can make factor<br />
interpretation more difficult. In an oddball experiment, for<br />
example, the P300 factor scores should be large for the target<br />
condition and small for the standard condition. However, if the<br />
factor scores are mean-corrected, then the mean task scores for<br />
the two conditions will be of equal amplitude and opposite signs<br />
(since mean correction splits the difference).<br />
Factor Retention<br />
Most of the <strong>PCA</strong> factors that are extracted account for<br />
small proportions of variance, which may be attributed to<br />
background noise, or minor departures from group trends. In the<br />
interest of parsimony, only the larger factors are typically<br />
retained, since they are considered most likely to contain<br />
interpretable signal. A common criterion for determining how<br />
16
many factors to retain is the scree test (Cattell, 1966; Cattell<br />
& Jaspers, 1967). This test is based on the principle that the<br />
<strong>PCA</strong> of a random set of data will produce a set of randomly sized<br />
factors. Since factors are extracted in order of descending<br />
size, when their sizes are graphed they will form a steady<br />
downward slope. A dataset containing signal, in addition to the<br />
noise, should have initial factors that are larger than would be<br />
expected from random data alone. The point of departure from<br />
the slope (the elbow) indicates the number of factors to retain.<br />
Factors beyond this point are likely to contain noise and are<br />
best dropped. Figure 5 plots the reconstructed grand-averaged<br />
data, using the retained factors in order to verify that<br />
meaningful factors have not been excluded.<br />
17
Figure 5. Reconstructed waveforms, computed as in Figure 4,<br />
using only Factors 1 and 2. Because the first two factors<br />
account for nearly all of the variance, the reconstruction is<br />
nearly as good as the original data (cf. Fig. 2).<br />
In practice, the scree plot for ERP datasets often contains<br />
multiple elbows, which can make it difficult to determine the<br />
proper number of factors to retain. Part of the problem is that<br />
the noise contains some unwanted signal (remnants of the<br />
background EEG). A modified version of the parallel test can be<br />
used to address this issue (<strong>Dien</strong>, 1998a). The parallel test<br />
determines how many factors represent signal by comparing the<br />
scree produced by the full dataset to that produced when only<br />
the noise is present. The noise level is estimated by<br />
generating an ERP average with every other trial inverted, which<br />
has the effect of canceling out the signal while leaving the<br />
noise level unchanged. The results of the parallel test should<br />
be considered a lower bound since retaining additional factors<br />
to account for major noise features can actually improve the<br />
analysis (for such an exampole, see <strong>Dien</strong>, 1998a), although in<br />
principle if too many additional factors are retained it can<br />
result in unwanted distinctions being made (such as between<br />
subject-specific variations of the component). In general, the<br />
experience of the first author is that between eight and sixteen<br />
18
factors is often appropriate, although this may depend, among<br />
other things, on the number of recording sites.<br />
Factor Rotation<br />
A critical step, after deciding how many factors to retain,<br />
is to determine the best way of allocating variance across the<br />
remaining factors. Unfortunately, there is no transparent<br />
relationship between the <strong>PCA</strong> factors and the latent variables of<br />
interest (i.e., ERP components). Eigenvalue decomposition<br />
blindly generates factors that account for maximum variance,<br />
which may be influenced by more than one latent variable;<br />
whereas, the goal is to have each factor represent a single ERP<br />
component.<br />
As shown in Figure 6, there is not a one-to-one mapping of<br />
factors to variables after the initial factor extraction.<br />
Rather, the initial extraction has maximized the variance of the<br />
first factor by including variance from as many variables as<br />
possible. In doing so, it has generated a factor that is a<br />
hybrid of two ERP components, the linear sum of roughly 10% of<br />
the P1 and 90% of the P3. The second factor contains the<br />
leftover variance of both components. This example demonstrates<br />
the danger of interpreting the initial unrotated factors<br />
19
directly, as is sometimes advocated (e.g., Rösler & Manzey,<br />
1981).<br />
Figure 6. Graph of unrotated factors. Only Factors 1 and 2<br />
are graphed, since Factors 3–6 are close to 0.<br />
Factor rotation is used to restructure the allocation of<br />
variables to factors to maximize the chance that each factor<br />
reflects a single latent variable. The most common rotation is<br />
Varimax (Kaiser, 1958). In Varimax, each of the retained<br />
factors is iteratively rotated pairwise with each of the other<br />
factors in turn, until changes in the solution become<br />
negligible. More specifically, the Varimax procedure rotates the<br />
20
two factors such that the sum of the factor loadings (raised to<br />
the fourth power) is maximized. This has the effect of favoring<br />
solutions in which factor loadings are as extreme as possible<br />
with a combination of near-zero loadings and large peak values.<br />
Since ERP components (other than DC potentials) tend to have<br />
zero activity for most of the epoch with a single major peak or<br />
dip, Varimax should yield a reasonable approximation to the<br />
underlying ERP components (Fig. 7). Temporal overlap of ERP<br />
components raises additional issues, which are addressed in the<br />
following section.<br />
21
Figure 7. Graph of Varimax-rotated Factor Loadings. Only Factors<br />
1 and 2 are graphed, since Factors 3–6 are close to 0.<br />
Simulation studies have demonstrated that the accuracy of a<br />
rotation is influenced by several situations, including<br />
component overlap and component correlation (<strong>Dien</strong>, 1998a).<br />
Component overlap is a problem, because the more similar two ERP<br />
components are, the more difficult it is to distinguish them<br />
(Möcks & Verleger, 1991). Further, correlations between<br />
components may lead to violations of statistical assumptions.<br />
The initial extraction and the subsequent Varimax rotation<br />
maintain strict orthogonality between the factors (so the<br />
factors are uncorrelated). To the extent that the components<br />
are in fact correlated, the model solution will be inaccurate,<br />
producing misallocation of variance across the factors.<br />
Component correlation can arise when two components respond to<br />
the same task variables (an example is the P300 and Slow Wave<br />
components, which often co-occur), or when both components<br />
respond to the same subject parameters (e.g., age, sex, or<br />
personality traits), or share a common spatial distribution.<br />
Further, these two components can be measured at some of the<br />
same electrodes due to their similar scalp topographies.<br />
22
Component correlation can be effectively addressed by using<br />
an oblique rotation, such as Promax (Hendrickson & White, 1964),<br />
allowing for correlated factors (<strong>Dien</strong>, 1998a). In Promax, the<br />
initial Varimax rotation is succeeded by a "relaxation" step, in<br />
which each individual factor is further rotated to maximize the<br />
number of variables with minimal loadings. A factor is adjusted<br />
in this fashion without regard to the other factors, allowing<br />
factors to become correlated, and thus relaxing the<br />
orthogonality constraint in the Varimax solution. The Promax<br />
rotation typically leads to solutions that more accurately<br />
capture the large features of the Varimax factors while<br />
minimizing the smaller features. As a result, Promax solutions<br />
tend to account for slightly less variance than the original<br />
Varimax solutions, but may also give more accurate results<br />
(<strong>Dien</strong>, 1998a). A typical result can be seen in Figure 8.<br />
23
Figure 8. Graph of Promax-rotated Factor Loadings. Only Factors<br />
1 and 2 are graphed, since Factors 3–6 are close to 0.<br />
Spatial versus Temporal <strong>PCA</strong><br />
A limitation of temporal <strong>PCA</strong> is that factors are defined<br />
solely as a function of component time course, as instantiated<br />
by the factor loadings. This means that ERP components which<br />
are topographically distinct, but have a similar time course,<br />
will be modeled by a single factor. A sign that this has<br />
occurred is when temporal <strong>PCA</strong> yields condition effects<br />
characterized by a scalp topography that differs from the<br />
overall factor topography.<br />
24
To address this problem, spatial <strong>PCA</strong> may be used as an<br />
alternative to temporal <strong>PCA</strong> (<strong>Dien</strong>, 1998a). In a spatial <strong>PCA</strong>,<br />
the data are arranged such that the variables are electrode<br />
locations, and observations are experimental conditions,<br />
subjects, and time points. The factor loadings therefore<br />
describe scalp topographies, instead of temporal patterns. This<br />
approach is less likely to confound ERP components with the same<br />
time course. as long as they are differentiated by the task or<br />
subject variance. On the other hand, it will be subject to the<br />
converse problem, confusing components with similar scalp<br />
topographies, even when they have clearly separate time<br />
dynamics.<br />
The choice between spatial versus temporal <strong>PCA</strong> should<br />
depend on specific analysis goals. A rule of thumb is that if<br />
time course is the focus of an analysis, then spatial <strong>PCA</strong> should<br />
be used, and vice versa. The reason is that the factor loadings<br />
are constrained to be the same across the entire dataset (i.e.,<br />
the same time course for temporal <strong>PCA</strong>, and the same scalp<br />
topography for spatial <strong>PCA</strong>). The factor scores, on the other<br />
hand, are free to vary between conditions and subjects. Thus,<br />
one can examine latency changes with spatial, but not temporal,<br />
<strong>PCA</strong>, and vice versa. In particular, this implies that temporal,<br />
25
ather than spatial, <strong>PCA</strong> should be more effective as a<br />
preprocessing step in source localization, since these modeling<br />
procedures rely on the scalp topography to infer the number and<br />
configuration of sources.<br />
All other things being equal, temporal <strong>PCA</strong> is in principle<br />
more accurate than spatial <strong>PCA</strong>, since component overlap reduces<br />
<strong>PCA</strong> accuracy, and volume conduction ensures that all ERP<br />
components will overlap in a spatial <strong>PCA</strong>. Furthermore, the<br />
effect of the Varimax and Promax rotations is to minimize factor<br />
overlap, which is a more appropriate goal for temporal than for<br />
spatial <strong>PCA</strong>. A caveat in either case is that ERP component<br />
separation can be achieved only if subject or task variance (or<br />
both) can effectively distinguish the components. In other<br />
words, the three sources of variance associated with each<br />
observation must collectively be able to distinguish the ERP<br />
components, regardless of whether the components differ along<br />
the variable dimension (time for temporal <strong>PCA</strong>, or space for<br />
spatial <strong>PCA</strong>).<br />
Recent alternatives to <strong>PCA</strong><br />
In recent years, a variety of multivariate statistical<br />
techniques have been developed and are increasingly making their<br />
way into the ERP literature. One such method, independent<br />
26
components analysis (Makeig, Bell, Jung, & Sejnowski, 1996), is<br />
discussed elsewhere in this volume.<br />
In this section, we present four multivariate techniques in<br />
ERP analysis, which share a common basis in their use of<br />
eigenvalue decomposition. Each technique has been claimed to<br />
address one or more problems with conventional <strong>PCA</strong>. The<br />
application these techniques in ERP research is very recent, and<br />
more work is needed to characterize their respective strengths<br />
and limitations for various ERP applications. Future<br />
developments of these techniques may lead to a powerful suite of<br />
tools that can be used to address a range of problems in ERP<br />
analysis.<br />
Sequential spatiotemporal (or temporospatial) <strong>PCA</strong><br />
A recent procedure for improved separation of ERP<br />
components is spatiotemporal (or temporospatial) <strong>PCA</strong> (Spencer,<br />
<strong>Dien</strong>, & Donchin, 1999; Spencer et al., 2001). In this<br />
procedure, ERP components that were confounded in the initial<br />
<strong>PCA</strong> are separated by the application of a second <strong>PCA</strong>, which is<br />
used to separate variance along the other dimension. For a<br />
temporospatial <strong>PCA</strong> this is accomplished by rearranging the<br />
factor scores resulting from the temporal <strong>PCA</strong> so that each<br />
column contains the factor scores from a different electrode. A<br />
27
spatial <strong>PCA</strong> can then be conducted using these factor scores as<br />
the new variables. While the temporal variance has been<br />
collapsed by the initial <strong>PCA</strong>, the subject and task variance are<br />
still expressed in the observations, and can thus be used to<br />
separate ERP components that were confounded in the initial <strong>PCA</strong>.<br />
In Spencer, et al. (1999, 2001), the initial spatial <strong>PCA</strong><br />
was followed by a temporal <strong>PCA</strong>, with the factor scores from all<br />
the factors combined within the same analysis (number of<br />
observations equal to number of subjects x number of tasks x<br />
number of spatial factors). This procedure led to a clear<br />
separation of the P300 from the Novelty P3. However, it also had<br />
an important drawback: the application of a single temporal <strong>PCA</strong><br />
to all the spatial factors could result in loss of some of the<br />
finer distinctions in time course between different spatial<br />
factors. This analytic strategy was necessary because the<br />
generalized inverse function, used by SAS to generate the factor<br />
scores, requires that there be more observations than variables.<br />
The <strong>PCA</strong> Toolkit has bypassed this requirement by directly<br />
rotating the factor scores (Möcks & Verleger, 1991), allowing<br />
each initial factor to be subjected to a separate <strong>PCA</strong> (following<br />
the example of Scott Makeig’s ICA toolbox and an independent<br />
suggestion by Bill Dunlap). In a more recent study (<strong>Dien</strong> et al.,<br />
28
2003b), an initial spatial <strong>PCA</strong> yielded 12 factors; each spatial<br />
factor was then subjected to a separate temporal <strong>PCA</strong> (each<br />
retaining four factors for simplicity's sake). For analyses<br />
using this newer approach (it makes little difference for the<br />
original approach), temporospatial <strong>PCA</strong> is recommended over<br />
spatiotemporal <strong>PCA</strong>, since temporal <strong>PCA</strong> may lead to better<br />
initial separation of ERP components. Subsequent application of<br />
a spatial <strong>PCA</strong> can then help separate components that were<br />
confounded in the temporal <strong>PCA</strong>. On the other hand, if latency<br />
analysis is a goal of the <strong>PCA</strong>, then spatial <strong>PCA</strong> should be done<br />
first (since latency analysis cannot be done on the results of a<br />
temporal <strong>PCA</strong>) with the succeeding temporal <strong>PCA</strong> step used to<br />
verify whether multiple components are present in the factor of<br />
interest, as demonstrated in another recent study (<strong>Dien</strong>,<br />
Spencer, & Donchin, in press).<br />
The full equation to generate the microvolt value for a<br />
specific time point t and channel c for a spatiotemporal <strong>PCA</strong> is<br />
L1 * V1 * L2 * S2 * V2 (where L1 is the spatial <strong>PCA</strong> factor<br />
loading for c, V1 is the standard deviation of c, L2 is the<br />
temporal <strong>PCA</strong> factor loading for t, S2 is the mean factor scores<br />
for the temporal factor, and V2 is the standard deviation of the<br />
spatial factor scores at t. The temporal and spatial terms are<br />
reversed for temporospatial <strong>PCA</strong>.<br />
29
Parametric <strong>PCA</strong><br />
Another recent method involves the use of parametric<br />
measures to improve <strong>PCA</strong> separation of latent factors, which<br />
differ along one or more stimulus dimension (<strong>Dien</strong> et al.,<br />
2003a). This more specialized procedure can only be conducted<br />
on datasets containing observations with a continuous range of<br />
values. In <strong>Dien</strong>, et al. (2003), ERP responses to sentence<br />
endings were averaged for each stimulus item (collapsing over<br />
subjects) rather than averaging over subjects (collapsing over<br />
items in each experimental condition). This item-averaging<br />
approach resulted in 120 sentence averages, which were rated on<br />
a number of linguistic parameters, such as meaningfulness and<br />
word frequency. After an initial temporal <strong>PCA</strong>, it was then<br />
possible to correlate the parameter of interest with the mean<br />
factor score at each channel, to determine the influence of the<br />
stimulus parameter on a given ERP component, such as the N400.<br />
This had the effect of highlighting the relationship between ERP<br />
components and stimulus parameters, while factoring out the<br />
effects of ERP components unrelated to the parameters of<br />
interest. In this fashion, parametric <strong>PCA</strong> can lead to scalp<br />
topographies that reflect only the parameters of interest,<br />
providing a new approach to functional separation of ERP<br />
components. This approach thus provides an alternative method to<br />
30
sequential <strong>PCA</strong> for deconfounding components. These components<br />
can be then be subjected to further analyses, such as dipole and<br />
linear inverse modeling.<br />
Partial Least Squares<br />
Partial least squares (PLS), like <strong>PCA</strong>, is a multivariate<br />
technique based on eigenvalue decomposition. Unlike <strong>PCA</strong>, PLS<br />
operates on the covariance between the data matrix and a matrix<br />
of contrasts that represents features of the experimental design<br />
(McIntosh, Bookstein, Haxby, & Grady, 1996). Similar to<br />
parametric <strong>PCA</strong> procedures, the decomposition is focused on<br />
variance due to the experimental manipulations (condition<br />
differences). A recent paper (Lobaugh et al., 2001) applied PLS<br />
to ERP data for the first time. Simulations showed that the PLS<br />
analysis led to accurate modeling of the spatial and temporal<br />
effects that were associated with condition differences in the<br />
ERP waveforms. Lobaugh, et al., also suggest that PLS may be an<br />
effective preprocessing method, identifying time points and<br />
electrodes that are sensitive to condition differences and can<br />
therefore be targeted for further analyses.<br />
One cautionary note concerning PLS arises from the use of<br />
difference waves, which are created by subtracting the ERP<br />
waveform in one experimental condition from the response in a<br />
31
different condition, in order to isolate experimental effects<br />
prior to factor extraction. This approach, based on the logic of<br />
subtraction, can lead to incorrect conclusions when the<br />
assumption of pure insertion is violated, that is, when two<br />
conditions are different in kind rather than in degree. It can<br />
also produce misleading results when a change in latency appears<br />
to be an amplitude effect or when multiple effects appear to be<br />
a single effect.<br />
Neuroimaging measures, such as ERP and fMRI, may be<br />
particularly subject to such misinterpretations, since both<br />
spatial (anatomical) and temporal, variance can lead to<br />
condition differences. If these multiple sources of variance not<br />
adequately separated, a temporal difference between conditions<br />
may be erroneously ascribed to a single anatomical region or ERP<br />
component (e.g., (Zarahn, Aguirre, & D'Esposito, 1999). In a<br />
recent example (Spencer, Abad, & Donchin, 2000), <strong>PCA</strong> was used to<br />
examine the claim that recollection (as compared with<br />
familiarity) is associated with a unique electrophysiological<br />
component. Spencer, et al., concluded that the effect was more<br />
accurately ascribed to differences in latency jitter, or trial-<br />
to-trial variance in peak latency of the P300 across the two<br />
conditions.<br />
32
For this reason, condition differences, while useful,<br />
should only be interpreted in respect to the overall patterns in<br />
the original data. Further, both spatial and temporal variance<br />
between conditions should be analyzed fully, to rule out<br />
differences in latency jitter or other electrophysiological<br />
effects that may be hidden or confounded through cognitive<br />
subtraction. This recommendation also applies to the<br />
interpretation of results from the use of partial variance<br />
techniques, such as PLS.<br />
Multi-Mode Factor Analysis<br />
The techniques discussed in previous sections were all<br />
based on two-mode (defined as a dimension of variance) analysis<br />
of ERP data. In temporal <strong>PCA</strong>, for instance, time points<br />
represent one dimension (variables axis), and the other<br />
dimension combines the remaining sources of variance — i.e.,<br />
subjects, electrodes, and experimental conditions (observations<br />
axis). By contrast, multi-mode procedures analyze the data<br />
across three or more dimensions simultaneously. For example, in<br />
trilinear decomposition (TLD), the subject data matrix X i is<br />
expressed as the cross-product of three factors, as shown in<br />
equation (1):<br />
X i =B *A i *C, (1)<br />
33
where B is a set of spatial components and C is a set of<br />
temporal components. A i represents the subject loadings on B and<br />
C. B and C are calculated in separate, spatial and temporal,<br />
singular value decompositions of the data and are then combined<br />
to yield a new decomposition of the data, for any fixed<br />
dimensionality (Wang, Begleiter, & Porjesz, 2000). Since tri-<br />
mode <strong>PCA</strong> is susceptible to the same rotational indeterminacies<br />
as regular <strong>PCA</strong>, rotational procedures will need to be developed<br />
and evaluated.<br />
It has been claimed that tri-mode <strong>PCA</strong> can effectively remove<br />
“nuisance” sources of variance, as described by Möcks (1985),<br />
providing greater sensitivity as compared with conventional,<br />
two-dimensional <strong>PCA</strong>. Further, Achim has described the use of<br />
multi-modal procedures to help address misallocation of variance<br />
(Achim & Bouchard, 1997). These reports point to the need for<br />
thorough and systematic comparison of multi-mode methods such as<br />
TLD with other methods, such as parametric <strong>PCA</strong> and PLS. This<br />
can only be done for algorithms that are made available to the<br />
rest of the research community, either through open source or<br />
through commercial software packages.<br />
34
Conclusion<br />
<strong>PCA</strong> and related procedures can provide an effective way to<br />
preprocess high-density ERP datasets, and to help separate<br />
components that differ in their sensitivity to spatial,<br />
temporal, or functional parameters. This brief review has<br />
attempted to characterize the current state of the art in <strong>PCA</strong> of<br />
ERPs. Ongoing research will continue to refine and optimize<br />
statistical procedures and will aim to determine the optimal<br />
procedures for statistical decompositions of ERP data.<br />
Ultimately, it is likely that a range of statistical tools will<br />
be required, each best suited to different applications in ERP<br />
analysis.<br />
35
Bibliography<br />
Achim, A., & Bouchard, S. (1997). Toward a dynamic topographic<br />
components model. Electroencephalography and Clinical<br />
Neurophysiology, 103, 381-385.<br />
Cattell, R. B. (1966). The scree test for the number of factors.<br />
Multivariate Behavioral Research, 1, 245-276.<br />
Cattell, R. B., & Jaspers, J. (1967). A general plasmode (No.<br />
3010-5-2) for factor analytic exercises and research.<br />
Multivariate Behavioral Research Monographs, 67-3, 1-212.<br />
Curry, S. H., Cooper, R., McCallum, W. C., Pocock, P. V.,<br />
Papakostopoulos, D., Skidmore, S., et al. (1983). The<br />
principal components of auditory target detection. In A. W.<br />
K. Gaillard & W. Ritter (Eds.), Tutorials in ERP research:<br />
Endogenous components (pp. 79-117). Amsterdam: North-<br />
Holland Publishing Company.<br />
<strong>Dien</strong>, J. (1998a). Addressing misallocation of variance in<br />
principal components analysis of event-related potentials.<br />
Brain Topography, 11(1), 43-55.<br />
36
<strong>Dien</strong>, J. (1998b). Issues in the application of the average<br />
reference: Review, critiques, and recommendations.<br />
Behavioral Research Methods, Instruments, and Computers,<br />
30(1), 34-43.<br />
<strong>Dien</strong>, J. (1999). Differential lateralization of trait anxiety<br />
and trait fearfulness: evoked potential correlates.<br />
Personality and Individual Differences, 26(1), 333-356.<br />
<strong>Dien</strong>, J., Beal, D., & Berg, P. (submitted). Optimizing principal<br />
components analysis for event-related potential analysis.<br />
<strong>Dien</strong>, J., Frishkoff, G. A., Cerbonne, A., & Tucker, D. M.<br />
(2003a). Parametric analysis of event-related potentials in<br />
semantic comprehension: Evidence for parallel brain<br />
mechanisms. Cognitive Brain Research, 15, 137-153.<br />
<strong>Dien</strong>, J., Frishkoff, G. A., & Tucker, D. M. (2000).<br />
Differentiating the N3 and N4 electrophysiological semantic<br />
incongruity effects. Brain & Cognition, 43, 148-152.<br />
<strong>Dien</strong>, J., Spencer, K. M., & Donchin, E. (2003b). Localization of<br />
the event-related potential novelty response as defined by<br />
principal components analysis. Cognitive Brain Research,<br />
17, 637-650.<br />
<strong>Dien</strong>, J., Spencer, K. M., & Donchin, E. (in press). Parsing the<br />
"Late Positive Complex": Mental chronometry and the ERP<br />
37
components that inhabit the neighborhood of the P300.<br />
Psychophysiology.<br />
<strong>Dien</strong>, J., Tucker, D. M., Potts, G., & Hartry, A. (1997).<br />
Localization of auditory evoked potentials related to<br />
selective intermodal attention. Journal of Cognitive<br />
Neuroscience, 9(6), 799-823.<br />
Donchin, E. (1966). A multivariate approach to the analysis of<br />
average evoked potentials. IEEE Transactions on Bio-Medical<br />
Engineering, BME-13, 131-139.<br />
Donchin, E., & Coles, M. G. H. (1991). While an undergraduate<br />
waits. Neuropsychologia, 29(6), 557-569.<br />
Donchin, E., & Heffley, E. (1979). Multivariate analysis of<br />
event-related potential data: A tutorial review. In D. Otto<br />
(Ed.), Multidisciplinary perspectives in event-related<br />
potential research (EPA 600/9-77-043) (pp. 555-572).<br />
Washington, DC: U.S. Government Printing Office.<br />
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ:<br />
Lawrence Erlbaum Associates.<br />
Hendrickson, A. E., & White, P. O. (1964). Promax: A quick<br />
method for rotation to oblique simple structure. The<br />
British Journal of Statistical Psychology, 17, 65-70.<br />
38
Kaiser, H. F. (1958). The varimax criterion for analytic<br />
rotation in factor analysis. Psychometrika, 23, 187-200.<br />
Lobaugh, N. J., West, R., & McIntosh, A. R. (2001).<br />
Spatiotemporal analysis of experimental differences in<br />
event-related potential data with partial least squares.<br />
Psychophysiology, 38(3), 517-530.<br />
Makeig, S., Bell, A. J., Jung, T., & Sejnowski, T. J. (1996).<br />
Independent component analysis of electroencephalographic<br />
data. Advances in Neural Information Processing Systems, 8,<br />
145-151.<br />
McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L.<br />
(1996). Spatial pattern analysis of functional brain images<br />
using Partial Least Squares. Neuroimage, 3, 143-157.<br />
Möcks, J. (1988). Topographic components model for event-related<br />
potentials and some biophysical considerations. IEEE<br />
Transactions on Biomedical Engineering, 35(6), 482-484.<br />
Möcks, J., & Verleger, R. (1991). Multivariate methods in<br />
biosignal analysis: application of principal component<br />
analysis to event-related potentials. In R. Weitkunat<br />
(Ed.), Digital Biosignal Processing (pp. 399-458).<br />
Amsterdam: Elsevier.<br />
39
Nunez, P. L. (1981). Electric fields of the brain: The<br />
neurophysics of EEG. New York: Oxford University Press.<br />
Rösler, F., & Manzey, D. (1981). Principal components and<br />
varimax-rotated components in event-related potential<br />
research: Some remarks on their interpretation. Biological<br />
Psychology, 13, 3-26.<br />
Ruchkin, D. S., Villegas, J., & John, E. R. (1964). An analysis<br />
of average evoked potentials making use of least mean<br />
square techniques. Annals of the New York Academy of<br />
Sciences, 115(2), 799-826.<br />
Spencer, K. M., Abad, E. V., & Donchin, E. (2000). On the search<br />
for the neurophysiological manifestation of recollective<br />
experience. Psychophysiology, 37, 494-506.<br />
Spencer, K. M., <strong>Dien</strong>, J., & Donchin, E. (1999). A componential<br />
analysis of the ERP elicited by novel events using a dense<br />
electrode array. Psychophysiology, 36, 409-414.<br />
Spencer, K. M., <strong>Dien</strong>, J., & Donchin, E. (2001). Spatiotemporal<br />
Analysis of the Late ERP Responses to Deviant Stimuli.<br />
Psychophysiology, 38(2), 343-358.<br />
Wang, K., Begleiter, H., & Porjesz, B. (2000). Trilinear<br />
modeling of event-related potentials. Brain Topography,<br />
12(4), 263-271.<br />
40
Wood, C. C., & McCarthy, G. (1984). Principal component analysis<br />
of event-related potentials: Simulation studies demonstrate<br />
misallocation of variance across components.<br />
Electroencephalography and Clinical Neurophysiology, 59,<br />
249-260.<br />
Zarahn, E., Aguirre, G. K., & D'Esposito, M. (1999). Temporal<br />
isolation of the neural correlates of spatial mnemonic<br />
processing with fMRI. Cognitive Brain Research, 7(3), 255-<br />
268.<br />
41