28.02.2013 Views

Dien PCA chapter.pdf

Dien PCA chapter.pdf

Dien PCA chapter.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Introduction to Principal Components Analysis of Event-Related<br />

To appear in:<br />

Potentials<br />

Joseph <strong>Dien</strong> 13 and Gwen A. Frishkoff 2<br />

1 Department of Psychology, Tulane University<br />

2 Department of Psychology, University of Oregon<br />

3 Department of Psychology, University of Kansas<br />

Event Related Potentials: A Methods Handbook. Handy, T. (editor).<br />

Cambridge, Mass: MIT Press.<br />

Address for correspondence: Joseph <strong>Dien</strong>, Department of Psychology,<br />

426 Fraser Hall, University of Kansas, 1415 Jayhawk Blvd., Lawrence,<br />

KS 66045-7556. E-mail: jdien@ku.edu.<br />

1


Introduction<br />

Over the last several decades, a variety of methods have<br />

been developed for statistical decomposition of event-related<br />

potentials (ERPs). The simplest and most widely applied of these<br />

techniques is principal components analysis (<strong>PCA</strong>). It belongs to<br />

a class of factor-analytic procedures, which use eigenvalue<br />

decomposition to extract linear combinations of variables<br />

(latent factors) in such a way as to account for patterns of<br />

covariance in the data parsimoniously, that is, with the fewest<br />

factors.<br />

In ERP data, the variables are the microvolt readings<br />

either at consecutive time points (temporal <strong>PCA</strong>) or at each<br />

electrode (spatial <strong>PCA</strong>). The major source of covariance is<br />

assumed to be the ERP components, characteristic features of the<br />

waveform that are spread across multiple time points and<br />

multiple electrodes (Donchin & Coles, 1991). Ideally, each<br />

latent factor corresponds to a separate ERP component, providing<br />

a statistical decomposition of the brain electrical patterns<br />

that are superposed in the scalp-recorded data.<br />

<strong>PCA</strong> has a range of applications for ERP analysis. First, it<br />

can be used for data reduction and cleaning or filtering, prior<br />

2


to data analysis. By reducing hundreds of variables to a handful<br />

of latent factors, <strong>PCA</strong> can greatly simplify analysis and<br />

description of complex data. Moreover, the factors retained for<br />

further analysis are considered more likely to represent pure<br />

signal (i.e., brain activity), as opposed to noise (i.e.,<br />

artifacts or background EEG).<br />

Second, <strong>PCA</strong> can be used in data exploration as a way to<br />

detect and summarize features that might otherwise escape visual<br />

inspection. This is particularly useful when ERPs are measured<br />

over many tens or hundreds of recording sites; spatial patterns<br />

can then be used to constrain the decomposition into latent<br />

temporal patterns, as described in the following section.<br />

The use of such high-density ERPs (recordings at 50 or more<br />

electrodes) has become increasingly popular in the last several<br />

years. A striking feature of high-density ERPs is that the<br />

complexity of the data seems to grow exponentially as the number<br />

of recording sites is doubled or tripled. Thus, while increases<br />

in spatial resolution can lead to important new discoveries,<br />

subtle patterns are likely to be missed, as higher spatial<br />

sampling reveals more and more complex patterns, overlapping in<br />

both time and space. A rational approach to data decomposition<br />

can improve the chances of detecting these subtler effects.<br />

3


Third, <strong>PCA</strong> can serve as an effective means of data<br />

description. In principle, <strong>PCA</strong> can describe features of the<br />

dataset more objectively and more precisely than is possible<br />

with the unaided eye. Such increased precision could be<br />

especially helpful when <strong>PCA</strong> is used as a preprocessing step for<br />

ERP source localization (<strong>Dien</strong>, 1999; <strong>Dien</strong>, Frishkoff, Cerbonne,<br />

& Tucker, 2003a; <strong>Dien</strong>, Spencer, & Donchin, 2003b; <strong>Dien</strong>, Tucker,<br />

Potts, & Hartry, 1997).<br />

Despite the many useful functions of <strong>PCA</strong>, this method has<br />

had a somewhat checkered history in ERP research, beginning in<br />

the 1960s (Donchin, 1966; Ruchkin, Villegas, & John, 1964). An<br />

influential review paper by Donchin & Heffley (1979) promoted<br />

the use of <strong>PCA</strong> for ERP component analysis. A few years later,<br />

however, <strong>PCA</strong> entered something of a dark age in the ERP field<br />

with the publication of a methodological critique (Wood &<br />

McCarthy, 1984), which demonstrated that <strong>PCA</strong> solutions may be<br />

subject to misallocation of variance across the latent factors.<br />

Wood and McCarthy noted that the same problems arise in the use<br />

of other techniques, such as reaction time and peak amplitude<br />

measures. The difference is that misallocation is made more<br />

explicit in <strong>PCA</strong>, which they argued should be regarded as an<br />

advantage. Yet this last point was often overlooked, and this<br />

seminal paper has, ironically, been cited as an argument against<br />

4


the use of <strong>PCA</strong>. Perhaps as a consequence, many researchers<br />

continued to rely on conventional ERP analysis techniques.<br />

More recently, the emergence of high-density ERPs has<br />

revived the interest in <strong>PCA</strong> as a method of data reduction.<br />

Moreover, some recent studies have shown that statistical<br />

decomposition can lead to novel insights into well-known ERP<br />

effects, providing evidence to help separate ERP components<br />

associated with different perceptual and cognitive operations<br />

(<strong>Dien</strong> et al., 2003a; <strong>Dien</strong>, Frishkoff, & Tucker, 2000; <strong>Dien</strong> et<br />

al., 1997; Spencer, <strong>Dien</strong>, & Donchin, 2001).<br />

The present review presents a systematic outline of the steps<br />

in temporal <strong>PCA</strong>, and the issues that arise at each step in<br />

implementation. Some problems and limitations of temporal <strong>PCA</strong><br />

are discussed, including rotational indeterminacy, problems of<br />

misallocation, and latency jitter. We then compare some recent<br />

alternatives to temporal <strong>PCA</strong>, namely: spatial <strong>PCA</strong> (<strong>Dien</strong>, 1998a),<br />

sequential (spatio-temporal or temporo-spatial) <strong>PCA</strong> (Spencer et<br />

al., 2001), parametric <strong>PCA</strong> (<strong>Dien</strong> et al., 2003a), multi-mode <strong>PCA</strong><br />

(Möcks, 1988), and partial least squares (PLS) (Lobaugh, West, &<br />

McIntosh, 2001). Each technique has evolved to address certain<br />

weaknesses with the traditional <strong>PCA</strong> method. We conclude with<br />

questions for further research, and advocate a research program<br />

5


for systematic comparison of the strengths and limitations of<br />

different multivariate techniques in ERP research.<br />

Steps in Temporal <strong>PCA</strong><br />

The two most common types of factor analysis are principal<br />

axis factors and principal components analysis. These methods<br />

are equivalent for all practical purposes when there are many<br />

variables and when the variables are highly correlated (Gorsuch,<br />

1983), as in most ERP datasets. In the ERP literature <strong>PCA</strong> is<br />

the more common method. Normally, one and the same term,<br />

"component," has been used for both <strong>PCA</strong> linear combinations and<br />

for characteristic spatial and temporal features of the ERP. To<br />

avoid confusion, the term factor (or latent factor) will be used<br />

here to refer to <strong>PCA</strong> (latent) components, and the term component<br />

will be reserved for spatiotemporal features of the ERP<br />

waveform.<br />

The <strong>PCA</strong> process consists of three main steps: computation<br />

of the relationship matrix, extraction and retention of the<br />

factors, and rotation to simple structure. In the following<br />

sections, <strong>PCA</strong> simulation is performed to illustrate each step,<br />

using the <strong>PCA</strong> Toolkit (version 1.06), a set of Matlab functions<br />

for performing <strong>PCA</strong> on ERP data. This toolkit was written by the<br />

first author and is freely available upon request.<br />

6


The Data Matrix<br />

A key to understanding <strong>PCA</strong> procedures as applied to ERPs is<br />

to be clear about how multiple sources of variance contribute to<br />

the data decomposition. In temporal <strong>PCA</strong>, the dataset is<br />

organized with the variables corresponding to time points, and<br />

observations corresponding to the different waveforms in the<br />

dataset, as shown in Figure 1.<br />

Figure 1. Data matrix, with dimensions 20 x 6. Variables are<br />

time points, measured in two conditions for ten subjects. Part<br />

b presents the covariance matrix computed from the data matrix.<br />

7


The waveforms vary across subjects, electrodes, and experimental<br />

conditions. Thus, subject, spatial, and task variance are<br />

collectively responsible for covariance among the temporal<br />

variables. Although it may seem odd to commingle these three<br />

sources of variance, they provide equally valid bases for<br />

distinguishing an ERP component; in this respect, is reasonable<br />

to treat them collectively. For example, the voltage readings<br />

tend to rise and fall together between 250 and 350 ms in a<br />

simple oddball experiment, because they are mutually influenced<br />

by the P300 which occurs during this period. Since individual<br />

differences, scalp location, and experimental task may all<br />

affect the recorded P300 amplitude, the amplitudes of these time<br />

points will likewise covary as a function of these three sources<br />

of variance. Figure 2 shows the grand-averaged waveforms (for<br />

n=10 subjects) corresponding to the simulated data in Figure 1.<br />

For simplicity, this example involves only one electrode site,<br />

ten subjects, and two experimental conditions. In subsequent<br />

sections, we will use these simulated data to help illustrate<br />

the steps in implementation of <strong>PCA</strong> for ERP analysis.<br />

8


Figure 2. Waveforms for grand-averaged data (n=10),<br />

corresponding to data in Figure 1. Graph displays two non-<br />

overlapping correlated components, plotted for two hypothetical<br />

conditions, A and B.<br />

The Relationship Matrix<br />

The first step in applying <strong>PCA</strong> is to generate a<br />

relationship (or association) matrix, which captures the<br />

interrelationships between temporal variables. The simplest<br />

such matrix is the sum-of-squares cross-products (SSCP) matrix.<br />

For each pair of variables, the two values for each observation<br />

are multiplied and then added together. Thus, variables that<br />

tend to rise and fall together will produce the highest values<br />

in the matrix. The diagonal of the matrix (the relationship of<br />

each variable to itself) is the sum of the squared values of<br />

each variable. For an example of the effect of using the SSCP<br />

9


matrix, see (Curry, Cooper, McCallum, Pocock, Papakostopoulos,<br />

Skidmore, & Newton, 1983). SSCP treats mean differences in the<br />

same fashion as differences in variance, which has odd effects<br />

on the <strong>PCA</strong> computations. For example, factors computed on an<br />

SSCP matrix can be correlated, even when they are orthogonal<br />

when using other matrices. In general, we do not recommend the<br />

use of the SSCP matrix in ERP analyses.<br />

An alternative to the SSCP matrix is the covariance matrix.<br />

This matrix is computed in the same fashion as the SSCP matrix,<br />

except that the mean of each variable is subtracted out before<br />

generating the relationship matrix. Mean correction ensures that<br />

variables with high mean values do not have a disproportionate<br />

effect on the factor solution. The effect of mean correction on<br />

the solution depends on the EEG reference site, a topic that is<br />

beyond the scope of this review (cf. <strong>Dien</strong>, 1998a).<br />

A third alternative is to use the correlation matrix as the<br />

relationship matrix. The correlation matrix is computed in the<br />

same fashion as the covariance matrix, except that the variable<br />

variances are standardized. This is accomplished by first mean<br />

correcting each variable, and then dividing each variable by its<br />

standard deviation, which ensures that the variables contribute<br />

equally to the factor solution. Since time points that do not<br />

10


contain ERP components have smaller variances, this procedure<br />

may exacerbate the influence of background noise. Simulation<br />

studies indicate that covariance matrices can yield more<br />

accurate results (<strong>Dien</strong>, Beal, & Berg, submitted). We therefore<br />

recommend the use of covariance matrices.<br />

In Figure 3(a), the simulated data are converted into a<br />

covariance matrix. Observe how the time points containing the<br />

two components (t2 and t5) result in larger entries than those<br />

without. The entries with the largest numbers will have the<br />

most influence on the next step in the <strong>PCA</strong> procedure: factor<br />

extraction.<br />

Factor Extraction<br />

In the extraction stage, a process called eigenvalue<br />

decomposition is performed, which progressively removes linear<br />

combinations of variables that account for the greatest variance<br />

at each step. Each linear combination constitutes a latent<br />

factor. In Figure 3, we demonstrate how this process<br />

iteratively reduces the remaining values in the relationship<br />

matrix to zero.<br />

11


Figure 3. (a) Original covariance matrix. (b) Covariance matrix<br />

after subtraction of Factor 1 (Varimax-rotated; see next<br />

Section). (c) Covariance matrix after subtraction of Factors 1<br />

and 2. Since Factors 1 and 2 together account for nearly all of<br />

the variance, the result is the null matrix.<br />

In general, <strong>PCA</strong> should extract as many factors as there are<br />

variables, as long as the number of observations is equal to or<br />

12


greater than the number of variables (i.e., as long as the data<br />

matrix is of full rank).<br />

The initial extraction yields an unrotated solution,<br />

consisting of a factor loading matrix and a factor score matrix.<br />

The factor loading matrix represents correlations between the<br />

variables and the factor scores. The factor score matrix<br />

indexes the magnitude of the factors for each of the<br />

observations and thus represents the relationship between the<br />

factors and the observations. If the two matrices are<br />

multiplied together, they will reproduce the data matrix. By<br />

convention, the reproduced data matrix will be in standardized<br />

form, regardless of the type of relationship matrix that was<br />

entered into the <strong>PCA</strong>. To recreate the original data matrix, the<br />

variables of this standardized matrix are multiplied by the<br />

original standard deviations, and the original variable means<br />

are restored.<br />

13


Figure 4. Reconstructed waveforms, calculated by multiplying the<br />

Factor Loadings by the Factor Scores, scaled to microvolts<br />

(i.e., multiplied by the matrix of standard deviations for the<br />

original data). Data reconstructed using Varimax-rotated factors<br />

for this example.<br />

For a temporal <strong>PCA</strong>, the loadings describe the time course<br />

of each of the factors. To accurately represent the time course<br />

of the factors, it is necessary to first multiply them by the<br />

variable standard deviations, which rescales them to microvolts<br />

14


(see proof in <strong>Dien</strong>, 1998a). Further, it is important to note<br />

that the sign of a given factor loading is arbitrary. This is<br />

necessarily the case since a given peak in the factor time<br />

course will be positive on one side of the head, and negative on<br />

the other side, due to the dipolar nature of electrical fields<br />

(Nunez, 1981). Note, further, that the dipolar distributions can<br />

be distorted or obscured by referencing biases in the data<br />

(<strong>Dien</strong>, 1998b). Only the product of the factor loading and the<br />

factor score corresponds to the original data in an unequivocal<br />

way. Thus, if the factor loading is positive at the peak, then<br />

the factor scores from one side of the head will be positive and<br />

the other side will be negative, corresponding to the dipolar<br />

field.<br />

The factor scores, on the other hand, provide information<br />

about the other sources of variance (i.e., subject, task, and<br />

spatial variance). For example, to compute the amplitude of a<br />

factor at a specific electrode site for a given task condition,<br />

one simply takes the factor scores corresponding to the<br />

observations for that task at the electrode of interest and<br />

computes their mean (across subjects). If this mean value is<br />

computed for each electrode, the resulting values can be used to<br />

plot the scalp topography for that factor. If a specific time<br />

point is chosen, it is possible reconstruct the scalp topography<br />

15


with the proper microvolt scaling by multiplying the mean scores<br />

by the factor loading and the standard deviation for the time<br />

point of interest (see proof in <strong>Dien</strong>, 1998a).<br />

Unlike the <strong>PCA</strong> algorithm in most statistics packages, the<br />

<strong>PCA</strong> Toolkit does not mean-correct the factor scores. This<br />

maintains an interpretable relationship between the factor<br />

scores and the original data. If factor scores are mean-<br />

corrected as part of the standardization, the mean task scores<br />

will be centered around zero, which can make factor<br />

interpretation more difficult. In an oddball experiment, for<br />

example, the P300 factor scores should be large for the target<br />

condition and small for the standard condition. However, if the<br />

factor scores are mean-corrected, then the mean task scores for<br />

the two conditions will be of equal amplitude and opposite signs<br />

(since mean correction splits the difference).<br />

Factor Retention<br />

Most of the <strong>PCA</strong> factors that are extracted account for<br />

small proportions of variance, which may be attributed to<br />

background noise, or minor departures from group trends. In the<br />

interest of parsimony, only the larger factors are typically<br />

retained, since they are considered most likely to contain<br />

interpretable signal. A common criterion for determining how<br />

16


many factors to retain is the scree test (Cattell, 1966; Cattell<br />

& Jaspers, 1967). This test is based on the principle that the<br />

<strong>PCA</strong> of a random set of data will produce a set of randomly sized<br />

factors. Since factors are extracted in order of descending<br />

size, when their sizes are graphed they will form a steady<br />

downward slope. A dataset containing signal, in addition to the<br />

noise, should have initial factors that are larger than would be<br />

expected from random data alone. The point of departure from<br />

the slope (the elbow) indicates the number of factors to retain.<br />

Factors beyond this point are likely to contain noise and are<br />

best dropped. Figure 5 plots the reconstructed grand-averaged<br />

data, using the retained factors in order to verify that<br />

meaningful factors have not been excluded.<br />

17


Figure 5. Reconstructed waveforms, computed as in Figure 4,<br />

using only Factors 1 and 2. Because the first two factors<br />

account for nearly all of the variance, the reconstruction is<br />

nearly as good as the original data (cf. Fig. 2).<br />

In practice, the scree plot for ERP datasets often contains<br />

multiple elbows, which can make it difficult to determine the<br />

proper number of factors to retain. Part of the problem is that<br />

the noise contains some unwanted signal (remnants of the<br />

background EEG). A modified version of the parallel test can be<br />

used to address this issue (<strong>Dien</strong>, 1998a). The parallel test<br />

determines how many factors represent signal by comparing the<br />

scree produced by the full dataset to that produced when only<br />

the noise is present. The noise level is estimated by<br />

generating an ERP average with every other trial inverted, which<br />

has the effect of canceling out the signal while leaving the<br />

noise level unchanged. The results of the parallel test should<br />

be considered a lower bound since retaining additional factors<br />

to account for major noise features can actually improve the<br />

analysis (for such an exampole, see <strong>Dien</strong>, 1998a), although in<br />

principle if too many additional factors are retained it can<br />

result in unwanted distinctions being made (such as between<br />

subject-specific variations of the component). In general, the<br />

experience of the first author is that between eight and sixteen<br />

18


factors is often appropriate, although this may depend, among<br />

other things, on the number of recording sites.<br />

Factor Rotation<br />

A critical step, after deciding how many factors to retain,<br />

is to determine the best way of allocating variance across the<br />

remaining factors. Unfortunately, there is no transparent<br />

relationship between the <strong>PCA</strong> factors and the latent variables of<br />

interest (i.e., ERP components). Eigenvalue decomposition<br />

blindly generates factors that account for maximum variance,<br />

which may be influenced by more than one latent variable;<br />

whereas, the goal is to have each factor represent a single ERP<br />

component.<br />

As shown in Figure 6, there is not a one-to-one mapping of<br />

factors to variables after the initial factor extraction.<br />

Rather, the initial extraction has maximized the variance of the<br />

first factor by including variance from as many variables as<br />

possible. In doing so, it has generated a factor that is a<br />

hybrid of two ERP components, the linear sum of roughly 10% of<br />

the P1 and 90% of the P3. The second factor contains the<br />

leftover variance of both components. This example demonstrates<br />

the danger of interpreting the initial unrotated factors<br />

19


directly, as is sometimes advocated (e.g., Rösler & Manzey,<br />

1981).<br />

Figure 6. Graph of unrotated factors. Only Factors 1 and 2<br />

are graphed, since Factors 3–6 are close to 0.<br />

Factor rotation is used to restructure the allocation of<br />

variables to factors to maximize the chance that each factor<br />

reflects a single latent variable. The most common rotation is<br />

Varimax (Kaiser, 1958). In Varimax, each of the retained<br />

factors is iteratively rotated pairwise with each of the other<br />

factors in turn, until changes in the solution become<br />

negligible. More specifically, the Varimax procedure rotates the<br />

20


two factors such that the sum of the factor loadings (raised to<br />

the fourth power) is maximized. This has the effect of favoring<br />

solutions in which factor loadings are as extreme as possible<br />

with a combination of near-zero loadings and large peak values.<br />

Since ERP components (other than DC potentials) tend to have<br />

zero activity for most of the epoch with a single major peak or<br />

dip, Varimax should yield a reasonable approximation to the<br />

underlying ERP components (Fig. 7). Temporal overlap of ERP<br />

components raises additional issues, which are addressed in the<br />

following section.<br />

21


Figure 7. Graph of Varimax-rotated Factor Loadings. Only Factors<br />

1 and 2 are graphed, since Factors 3–6 are close to 0.<br />

Simulation studies have demonstrated that the accuracy of a<br />

rotation is influenced by several situations, including<br />

component overlap and component correlation (<strong>Dien</strong>, 1998a).<br />

Component overlap is a problem, because the more similar two ERP<br />

components are, the more difficult it is to distinguish them<br />

(Möcks & Verleger, 1991). Further, correlations between<br />

components may lead to violations of statistical assumptions.<br />

The initial extraction and the subsequent Varimax rotation<br />

maintain strict orthogonality between the factors (so the<br />

factors are uncorrelated). To the extent that the components<br />

are in fact correlated, the model solution will be inaccurate,<br />

producing misallocation of variance across the factors.<br />

Component correlation can arise when two components respond to<br />

the same task variables (an example is the P300 and Slow Wave<br />

components, which often co-occur), or when both components<br />

respond to the same subject parameters (e.g., age, sex, or<br />

personality traits), or share a common spatial distribution.<br />

Further, these two components can be measured at some of the<br />

same electrodes due to their similar scalp topographies.<br />

22


Component correlation can be effectively addressed by using<br />

an oblique rotation, such as Promax (Hendrickson & White, 1964),<br />

allowing for correlated factors (<strong>Dien</strong>, 1998a). In Promax, the<br />

initial Varimax rotation is succeeded by a "relaxation" step, in<br />

which each individual factor is further rotated to maximize the<br />

number of variables with minimal loadings. A factor is adjusted<br />

in this fashion without regard to the other factors, allowing<br />

factors to become correlated, and thus relaxing the<br />

orthogonality constraint in the Varimax solution. The Promax<br />

rotation typically leads to solutions that more accurately<br />

capture the large features of the Varimax factors while<br />

minimizing the smaller features. As a result, Promax solutions<br />

tend to account for slightly less variance than the original<br />

Varimax solutions, but may also give more accurate results<br />

(<strong>Dien</strong>, 1998a). A typical result can be seen in Figure 8.<br />

23


Figure 8. Graph of Promax-rotated Factor Loadings. Only Factors<br />

1 and 2 are graphed, since Factors 3–6 are close to 0.<br />

Spatial versus Temporal <strong>PCA</strong><br />

A limitation of temporal <strong>PCA</strong> is that factors are defined<br />

solely as a function of component time course, as instantiated<br />

by the factor loadings. This means that ERP components which<br />

are topographically distinct, but have a similar time course,<br />

will be modeled by a single factor. A sign that this has<br />

occurred is when temporal <strong>PCA</strong> yields condition effects<br />

characterized by a scalp topography that differs from the<br />

overall factor topography.<br />

24


To address this problem, spatial <strong>PCA</strong> may be used as an<br />

alternative to temporal <strong>PCA</strong> (<strong>Dien</strong>, 1998a). In a spatial <strong>PCA</strong>,<br />

the data are arranged such that the variables are electrode<br />

locations, and observations are experimental conditions,<br />

subjects, and time points. The factor loadings therefore<br />

describe scalp topographies, instead of temporal patterns. This<br />

approach is less likely to confound ERP components with the same<br />

time course. as long as they are differentiated by the task or<br />

subject variance. On the other hand, it will be subject to the<br />

converse problem, confusing components with similar scalp<br />

topographies, even when they have clearly separate time<br />

dynamics.<br />

The choice between spatial versus temporal <strong>PCA</strong> should<br />

depend on specific analysis goals. A rule of thumb is that if<br />

time course is the focus of an analysis, then spatial <strong>PCA</strong> should<br />

be used, and vice versa. The reason is that the factor loadings<br />

are constrained to be the same across the entire dataset (i.e.,<br />

the same time course for temporal <strong>PCA</strong>, and the same scalp<br />

topography for spatial <strong>PCA</strong>). The factor scores, on the other<br />

hand, are free to vary between conditions and subjects. Thus,<br />

one can examine latency changes with spatial, but not temporal,<br />

<strong>PCA</strong>, and vice versa. In particular, this implies that temporal,<br />

25


ather than spatial, <strong>PCA</strong> should be more effective as a<br />

preprocessing step in source localization, since these modeling<br />

procedures rely on the scalp topography to infer the number and<br />

configuration of sources.<br />

All other things being equal, temporal <strong>PCA</strong> is in principle<br />

more accurate than spatial <strong>PCA</strong>, since component overlap reduces<br />

<strong>PCA</strong> accuracy, and volume conduction ensures that all ERP<br />

components will overlap in a spatial <strong>PCA</strong>. Furthermore, the<br />

effect of the Varimax and Promax rotations is to minimize factor<br />

overlap, which is a more appropriate goal for temporal than for<br />

spatial <strong>PCA</strong>. A caveat in either case is that ERP component<br />

separation can be achieved only if subject or task variance (or<br />

both) can effectively distinguish the components. In other<br />

words, the three sources of variance associated with each<br />

observation must collectively be able to distinguish the ERP<br />

components, regardless of whether the components differ along<br />

the variable dimension (time for temporal <strong>PCA</strong>, or space for<br />

spatial <strong>PCA</strong>).<br />

Recent alternatives to <strong>PCA</strong><br />

In recent years, a variety of multivariate statistical<br />

techniques have been developed and are increasingly making their<br />

way into the ERP literature. One such method, independent<br />

26


components analysis (Makeig, Bell, Jung, & Sejnowski, 1996), is<br />

discussed elsewhere in this volume.<br />

In this section, we present four multivariate techniques in<br />

ERP analysis, which share a common basis in their use of<br />

eigenvalue decomposition. Each technique has been claimed to<br />

address one or more problems with conventional <strong>PCA</strong>. The<br />

application these techniques in ERP research is very recent, and<br />

more work is needed to characterize their respective strengths<br />

and limitations for various ERP applications. Future<br />

developments of these techniques may lead to a powerful suite of<br />

tools that can be used to address a range of problems in ERP<br />

analysis.<br />

Sequential spatiotemporal (or temporospatial) <strong>PCA</strong><br />

A recent procedure for improved separation of ERP<br />

components is spatiotemporal (or temporospatial) <strong>PCA</strong> (Spencer,<br />

<strong>Dien</strong>, & Donchin, 1999; Spencer et al., 2001). In this<br />

procedure, ERP components that were confounded in the initial<br />

<strong>PCA</strong> are separated by the application of a second <strong>PCA</strong>, which is<br />

used to separate variance along the other dimension. For a<br />

temporospatial <strong>PCA</strong> this is accomplished by rearranging the<br />

factor scores resulting from the temporal <strong>PCA</strong> so that each<br />

column contains the factor scores from a different electrode. A<br />

27


spatial <strong>PCA</strong> can then be conducted using these factor scores as<br />

the new variables. While the temporal variance has been<br />

collapsed by the initial <strong>PCA</strong>, the subject and task variance are<br />

still expressed in the observations, and can thus be used to<br />

separate ERP components that were confounded in the initial <strong>PCA</strong>.<br />

In Spencer, et al. (1999, 2001), the initial spatial <strong>PCA</strong><br />

was followed by a temporal <strong>PCA</strong>, with the factor scores from all<br />

the factors combined within the same analysis (number of<br />

observations equal to number of subjects x number of tasks x<br />

number of spatial factors). This procedure led to a clear<br />

separation of the P300 from the Novelty P3. However, it also had<br />

an important drawback: the application of a single temporal <strong>PCA</strong><br />

to all the spatial factors could result in loss of some of the<br />

finer distinctions in time course between different spatial<br />

factors. This analytic strategy was necessary because the<br />

generalized inverse function, used by SAS to generate the factor<br />

scores, requires that there be more observations than variables.<br />

The <strong>PCA</strong> Toolkit has bypassed this requirement by directly<br />

rotating the factor scores (Möcks & Verleger, 1991), allowing<br />

each initial factor to be subjected to a separate <strong>PCA</strong> (following<br />

the example of Scott Makeig’s ICA toolbox and an independent<br />

suggestion by Bill Dunlap). In a more recent study (<strong>Dien</strong> et al.,<br />

28


2003b), an initial spatial <strong>PCA</strong> yielded 12 factors; each spatial<br />

factor was then subjected to a separate temporal <strong>PCA</strong> (each<br />

retaining four factors for simplicity's sake). For analyses<br />

using this newer approach (it makes little difference for the<br />

original approach), temporospatial <strong>PCA</strong> is recommended over<br />

spatiotemporal <strong>PCA</strong>, since temporal <strong>PCA</strong> may lead to better<br />

initial separation of ERP components. Subsequent application of<br />

a spatial <strong>PCA</strong> can then help separate components that were<br />

confounded in the temporal <strong>PCA</strong>. On the other hand, if latency<br />

analysis is a goal of the <strong>PCA</strong>, then spatial <strong>PCA</strong> should be done<br />

first (since latency analysis cannot be done on the results of a<br />

temporal <strong>PCA</strong>) with the succeeding temporal <strong>PCA</strong> step used to<br />

verify whether multiple components are present in the factor of<br />

interest, as demonstrated in another recent study (<strong>Dien</strong>,<br />

Spencer, & Donchin, in press).<br />

The full equation to generate the microvolt value for a<br />

specific time point t and channel c for a spatiotemporal <strong>PCA</strong> is<br />

L1 * V1 * L2 * S2 * V2 (where L1 is the spatial <strong>PCA</strong> factor<br />

loading for c, V1 is the standard deviation of c, L2 is the<br />

temporal <strong>PCA</strong> factor loading for t, S2 is the mean factor scores<br />

for the temporal factor, and V2 is the standard deviation of the<br />

spatial factor scores at t. The temporal and spatial terms are<br />

reversed for temporospatial <strong>PCA</strong>.<br />

29


Parametric <strong>PCA</strong><br />

Another recent method involves the use of parametric<br />

measures to improve <strong>PCA</strong> separation of latent factors, which<br />

differ along one or more stimulus dimension (<strong>Dien</strong> et al.,<br />

2003a). This more specialized procedure can only be conducted<br />

on datasets containing observations with a continuous range of<br />

values. In <strong>Dien</strong>, et al. (2003), ERP responses to sentence<br />

endings were averaged for each stimulus item (collapsing over<br />

subjects) rather than averaging over subjects (collapsing over<br />

items in each experimental condition). This item-averaging<br />

approach resulted in 120 sentence averages, which were rated on<br />

a number of linguistic parameters, such as meaningfulness and<br />

word frequency. After an initial temporal <strong>PCA</strong>, it was then<br />

possible to correlate the parameter of interest with the mean<br />

factor score at each channel, to determine the influence of the<br />

stimulus parameter on a given ERP component, such as the N400.<br />

This had the effect of highlighting the relationship between ERP<br />

components and stimulus parameters, while factoring out the<br />

effects of ERP components unrelated to the parameters of<br />

interest. In this fashion, parametric <strong>PCA</strong> can lead to scalp<br />

topographies that reflect only the parameters of interest,<br />

providing a new approach to functional separation of ERP<br />

components. This approach thus provides an alternative method to<br />

30


sequential <strong>PCA</strong> for deconfounding components. These components<br />

can be then be subjected to further analyses, such as dipole and<br />

linear inverse modeling.<br />

Partial Least Squares<br />

Partial least squares (PLS), like <strong>PCA</strong>, is a multivariate<br />

technique based on eigenvalue decomposition. Unlike <strong>PCA</strong>, PLS<br />

operates on the covariance between the data matrix and a matrix<br />

of contrasts that represents features of the experimental design<br />

(McIntosh, Bookstein, Haxby, & Grady, 1996). Similar to<br />

parametric <strong>PCA</strong> procedures, the decomposition is focused on<br />

variance due to the experimental manipulations (condition<br />

differences). A recent paper (Lobaugh et al., 2001) applied PLS<br />

to ERP data for the first time. Simulations showed that the PLS<br />

analysis led to accurate modeling of the spatial and temporal<br />

effects that were associated with condition differences in the<br />

ERP waveforms. Lobaugh, et al., also suggest that PLS may be an<br />

effective preprocessing method, identifying time points and<br />

electrodes that are sensitive to condition differences and can<br />

therefore be targeted for further analyses.<br />

One cautionary note concerning PLS arises from the use of<br />

difference waves, which are created by subtracting the ERP<br />

waveform in one experimental condition from the response in a<br />

31


different condition, in order to isolate experimental effects<br />

prior to factor extraction. This approach, based on the logic of<br />

subtraction, can lead to incorrect conclusions when the<br />

assumption of pure insertion is violated, that is, when two<br />

conditions are different in kind rather than in degree. It can<br />

also produce misleading results when a change in latency appears<br />

to be an amplitude effect or when multiple effects appear to be<br />

a single effect.<br />

Neuroimaging measures, such as ERP and fMRI, may be<br />

particularly subject to such misinterpretations, since both<br />

spatial (anatomical) and temporal, variance can lead to<br />

condition differences. If these multiple sources of variance not<br />

adequately separated, a temporal difference between conditions<br />

may be erroneously ascribed to a single anatomical region or ERP<br />

component (e.g., (Zarahn, Aguirre, & D'Esposito, 1999). In a<br />

recent example (Spencer, Abad, & Donchin, 2000), <strong>PCA</strong> was used to<br />

examine the claim that recollection (as compared with<br />

familiarity) is associated with a unique electrophysiological<br />

component. Spencer, et al., concluded that the effect was more<br />

accurately ascribed to differences in latency jitter, or trial-<br />

to-trial variance in peak latency of the P300 across the two<br />

conditions.<br />

32


For this reason, condition differences, while useful,<br />

should only be interpreted in respect to the overall patterns in<br />

the original data. Further, both spatial and temporal variance<br />

between conditions should be analyzed fully, to rule out<br />

differences in latency jitter or other electrophysiological<br />

effects that may be hidden or confounded through cognitive<br />

subtraction. This recommendation also applies to the<br />

interpretation of results from the use of partial variance<br />

techniques, such as PLS.<br />

Multi-Mode Factor Analysis<br />

The techniques discussed in previous sections were all<br />

based on two-mode (defined as a dimension of variance) analysis<br />

of ERP data. In temporal <strong>PCA</strong>, for instance, time points<br />

represent one dimension (variables axis), and the other<br />

dimension combines the remaining sources of variance — i.e.,<br />

subjects, electrodes, and experimental conditions (observations<br />

axis). By contrast, multi-mode procedures analyze the data<br />

across three or more dimensions simultaneously. For example, in<br />

trilinear decomposition (TLD), the subject data matrix X i is<br />

expressed as the cross-product of three factors, as shown in<br />

equation (1):<br />

X i =B *A i *C, (1)<br />

33


where B is a set of spatial components and C is a set of<br />

temporal components. A i represents the subject loadings on B and<br />

C. B and C are calculated in separate, spatial and temporal,<br />

singular value decompositions of the data and are then combined<br />

to yield a new decomposition of the data, for any fixed<br />

dimensionality (Wang, Begleiter, & Porjesz, 2000). Since tri-<br />

mode <strong>PCA</strong> is susceptible to the same rotational indeterminacies<br />

as regular <strong>PCA</strong>, rotational procedures will need to be developed<br />

and evaluated.<br />

It has been claimed that tri-mode <strong>PCA</strong> can effectively remove<br />

“nuisance” sources of variance, as described by Möcks (1985),<br />

providing greater sensitivity as compared with conventional,<br />

two-dimensional <strong>PCA</strong>. Further, Achim has described the use of<br />

multi-modal procedures to help address misallocation of variance<br />

(Achim & Bouchard, 1997). These reports point to the need for<br />

thorough and systematic comparison of multi-mode methods such as<br />

TLD with other methods, such as parametric <strong>PCA</strong> and PLS. This<br />

can only be done for algorithms that are made available to the<br />

rest of the research community, either through open source or<br />

through commercial software packages.<br />

34


Conclusion<br />

<strong>PCA</strong> and related procedures can provide an effective way to<br />

preprocess high-density ERP datasets, and to help separate<br />

components that differ in their sensitivity to spatial,<br />

temporal, or functional parameters. This brief review has<br />

attempted to characterize the current state of the art in <strong>PCA</strong> of<br />

ERPs. Ongoing research will continue to refine and optimize<br />

statistical procedures and will aim to determine the optimal<br />

procedures for statistical decompositions of ERP data.<br />

Ultimately, it is likely that a range of statistical tools will<br />

be required, each best suited to different applications in ERP<br />

analysis.<br />

35


Bibliography<br />

Achim, A., & Bouchard, S. (1997). Toward a dynamic topographic<br />

components model. Electroencephalography and Clinical<br />

Neurophysiology, 103, 381-385.<br />

Cattell, R. B. (1966). The scree test for the number of factors.<br />

Multivariate Behavioral Research, 1, 245-276.<br />

Cattell, R. B., & Jaspers, J. (1967). A general plasmode (No.<br />

3010-5-2) for factor analytic exercises and research.<br />

Multivariate Behavioral Research Monographs, 67-3, 1-212.<br />

Curry, S. H., Cooper, R., McCallum, W. C., Pocock, P. V.,<br />

Papakostopoulos, D., Skidmore, S., et al. (1983). The<br />

principal components of auditory target detection. In A. W.<br />

K. Gaillard & W. Ritter (Eds.), Tutorials in ERP research:<br />

Endogenous components (pp. 79-117). Amsterdam: North-<br />

Holland Publishing Company.<br />

<strong>Dien</strong>, J. (1998a). Addressing misallocation of variance in<br />

principal components analysis of event-related potentials.<br />

Brain Topography, 11(1), 43-55.<br />

36


<strong>Dien</strong>, J. (1998b). Issues in the application of the average<br />

reference: Review, critiques, and recommendations.<br />

Behavioral Research Methods, Instruments, and Computers,<br />

30(1), 34-43.<br />

<strong>Dien</strong>, J. (1999). Differential lateralization of trait anxiety<br />

and trait fearfulness: evoked potential correlates.<br />

Personality and Individual Differences, 26(1), 333-356.<br />

<strong>Dien</strong>, J., Beal, D., & Berg, P. (submitted). Optimizing principal<br />

components analysis for event-related potential analysis.<br />

<strong>Dien</strong>, J., Frishkoff, G. A., Cerbonne, A., & Tucker, D. M.<br />

(2003a). Parametric analysis of event-related potentials in<br />

semantic comprehension: Evidence for parallel brain<br />

mechanisms. Cognitive Brain Research, 15, 137-153.<br />

<strong>Dien</strong>, J., Frishkoff, G. A., & Tucker, D. M. (2000).<br />

Differentiating the N3 and N4 electrophysiological semantic<br />

incongruity effects. Brain & Cognition, 43, 148-152.<br />

<strong>Dien</strong>, J., Spencer, K. M., & Donchin, E. (2003b). Localization of<br />

the event-related potential novelty response as defined by<br />

principal components analysis. Cognitive Brain Research,<br />

17, 637-650.<br />

<strong>Dien</strong>, J., Spencer, K. M., & Donchin, E. (in press). Parsing the<br />

"Late Positive Complex": Mental chronometry and the ERP<br />

37


components that inhabit the neighborhood of the P300.<br />

Psychophysiology.<br />

<strong>Dien</strong>, J., Tucker, D. M., Potts, G., & Hartry, A. (1997).<br />

Localization of auditory evoked potentials related to<br />

selective intermodal attention. Journal of Cognitive<br />

Neuroscience, 9(6), 799-823.<br />

Donchin, E. (1966). A multivariate approach to the analysis of<br />

average evoked potentials. IEEE Transactions on Bio-Medical<br />

Engineering, BME-13, 131-139.<br />

Donchin, E., & Coles, M. G. H. (1991). While an undergraduate<br />

waits. Neuropsychologia, 29(6), 557-569.<br />

Donchin, E., & Heffley, E. (1979). Multivariate analysis of<br />

event-related potential data: A tutorial review. In D. Otto<br />

(Ed.), Multidisciplinary perspectives in event-related<br />

potential research (EPA 600/9-77-043) (pp. 555-572).<br />

Washington, DC: U.S. Government Printing Office.<br />

Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ:<br />

Lawrence Erlbaum Associates.<br />

Hendrickson, A. E., & White, P. O. (1964). Promax: A quick<br />

method for rotation to oblique simple structure. The<br />

British Journal of Statistical Psychology, 17, 65-70.<br />

38


Kaiser, H. F. (1958). The varimax criterion for analytic<br />

rotation in factor analysis. Psychometrika, 23, 187-200.<br />

Lobaugh, N. J., West, R., & McIntosh, A. R. (2001).<br />

Spatiotemporal analysis of experimental differences in<br />

event-related potential data with partial least squares.<br />

Psychophysiology, 38(3), 517-530.<br />

Makeig, S., Bell, A. J., Jung, T., & Sejnowski, T. J. (1996).<br />

Independent component analysis of electroencephalographic<br />

data. Advances in Neural Information Processing Systems, 8,<br />

145-151.<br />

McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L.<br />

(1996). Spatial pattern analysis of functional brain images<br />

using Partial Least Squares. Neuroimage, 3, 143-157.<br />

Möcks, J. (1988). Topographic components model for event-related<br />

potentials and some biophysical considerations. IEEE<br />

Transactions on Biomedical Engineering, 35(6), 482-484.<br />

Möcks, J., & Verleger, R. (1991). Multivariate methods in<br />

biosignal analysis: application of principal component<br />

analysis to event-related potentials. In R. Weitkunat<br />

(Ed.), Digital Biosignal Processing (pp. 399-458).<br />

Amsterdam: Elsevier.<br />

39


Nunez, P. L. (1981). Electric fields of the brain: The<br />

neurophysics of EEG. New York: Oxford University Press.<br />

Rösler, F., & Manzey, D. (1981). Principal components and<br />

varimax-rotated components in event-related potential<br />

research: Some remarks on their interpretation. Biological<br />

Psychology, 13, 3-26.<br />

Ruchkin, D. S., Villegas, J., & John, E. R. (1964). An analysis<br />

of average evoked potentials making use of least mean<br />

square techniques. Annals of the New York Academy of<br />

Sciences, 115(2), 799-826.<br />

Spencer, K. M., Abad, E. V., & Donchin, E. (2000). On the search<br />

for the neurophysiological manifestation of recollective<br />

experience. Psychophysiology, 37, 494-506.<br />

Spencer, K. M., <strong>Dien</strong>, J., & Donchin, E. (1999). A componential<br />

analysis of the ERP elicited by novel events using a dense<br />

electrode array. Psychophysiology, 36, 409-414.<br />

Spencer, K. M., <strong>Dien</strong>, J., & Donchin, E. (2001). Spatiotemporal<br />

Analysis of the Late ERP Responses to Deviant Stimuli.<br />

Psychophysiology, 38(2), 343-358.<br />

Wang, K., Begleiter, H., & Porjesz, B. (2000). Trilinear<br />

modeling of event-related potentials. Brain Topography,<br />

12(4), 263-271.<br />

40


Wood, C. C., & McCarthy, G. (1984). Principal component analysis<br />

of event-related potentials: Simulation studies demonstrate<br />

misallocation of variance across components.<br />

Electroencephalography and Clinical Neurophysiology, 59,<br />

249-260.<br />

Zarahn, E., Aguirre, G. K., & D'Esposito, M. (1999). Temporal<br />

isolation of the neural correlates of spatial mnemonic<br />

processing with fMRI. Cognitive Brain Research, 7(3), 255-<br />

268.<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!