12.07.2015 Views

Partial Least Squares (PLS) methods for neuroimaging: A tutorial ...

Partial Least Squares (PLS) methods for neuroimaging: A tutorial ...

Partial Least Squares (PLS) methods for neuroimaging: A tutorial ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

YNIMG-07495; No. of pages: 21; 4C:NeuroImage xxx (2010) xxx–xxxContents lists available at ScienceDirectNeuroImagejournal homepage: www.elsevier.com/locate/ynimg<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and reviewAnjali Krishnan a , Lynne J. Williams b , Anthony Randal McIntosh c,d, ⁎, Hervé Abdi a, ⁎a School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, 800 West Campbell Road Richardson, TX 75080-3021, USAb The Kunen-Luenfeld Applied Research Unit, The Rotman Research Institute, Baycrest, 3560 Bathurst Street, Toronto, ON, Canada M6A 2E1c Department of Psychology, Sidney Smith Hall, 4th Floor, University of Toronto, 100 St. George Street, Toronto, ON, Canada M5S 3G3d The Rotman Research Institute, Baycrest, 3560 Bathurst Street, Toronto, ON, Canada M6A 2E1articleinfoabstractArticle history:Received 21 January 2010Revised 1 July 2010Accepted 19 July 2010Available online xxxxKeywords:<strong>Partial</strong> least squares correlation<strong>Partial</strong> least squares regression<strong>Partial</strong> least squares path modeling<strong>PLS</strong>Symmetric <strong>PLS</strong>Asymmetric <strong>PLS</strong>Task <strong>PLS</strong>Behavior <strong>PLS</strong>Seed <strong>PLS</strong>Multi-block <strong>PLS</strong>Multi-table <strong>PLS</strong>Canonical variate analysisCo-inertia analysisMultiple factor analysisSTATISBarycentric discriminant analysisMultiple factor analysisCommon factor analysis<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> are particularly suited to the analysis of relationships between measuresof brain activity and of behavior or experimental design. In <strong>neuroimaging</strong>, <strong>PLS</strong> refers to two related <strong>methods</strong>:(1) symmetric <strong>PLS</strong> or <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Correlation (<strong>PLS</strong>C), and (2) asymmetric <strong>PLS</strong> or <strong>Partial</strong> <strong>Least</strong><strong>Squares</strong> Regression (<strong>PLS</strong>R). The most popular (by far) version of <strong>PLS</strong> <strong>for</strong> <strong>neuroimaging</strong> is <strong>PLS</strong>C. It exists inseveral varieties based on the type of data that are related to brain activity: behavior <strong>PLS</strong>C analyzes therelationship between brain activity and behavioral data, task <strong>PLS</strong>C analyzes how brain activity relates to predefinedcategories or experimental design, seed <strong>PLS</strong>C analyzes the pattern of connectivity between brainregions, and multi-block or multi-table <strong>PLS</strong>C integrates one or more of these varieties in a common analysis.<strong>PLS</strong>R, in contrast to <strong>PLS</strong>C, is a predictive technique which, typically, predicts behavior (or design) from brainactivity. For both <strong>PLS</strong> <strong>methods</strong>, statistical inferences are implemented using cross-validation techniques toidentify significant patterns of voxel activation. This paper presents both <strong>PLS</strong> <strong>methods</strong> and illustrates themwith small numerical examples and typical applications in <strong>neuroimaging</strong>.© 2010 Elsevier Inc. All rights reserved.1. IntroductionOriginally developed <strong>for</strong> econometrics and chemometrics (Wold,1982), <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) is a multivariate statisticaltechnique first introduced to functional <strong>neuroimaging</strong> by McIntoshet al. (1996) with the goal of extracting commonalities between brainactivity and behavior or experimental design. In <strong>neuroimaging</strong> thereare two basic types of <strong>PLS</strong> <strong>methods</strong>, which we call <strong>Partial</strong> <strong>Least</strong><strong>Squares</strong> Correlation (<strong>PLS</strong>C; McIntosh et al., 1996), and <strong>Partial</strong> <strong>Least</strong><strong>Squares</strong> Regression (<strong>PLS</strong>R; Wold, 1982; de Jong, 1993; Wold et al.,2001). <strong>PLS</strong>C (Tucker, 1958; Bookstein, 1982; Streissguth et al., 1993;Bookstein, 1994; McIntosh et al., 1996) is a correlational techniquethat analyzes associations between two sets of data (e.g., behavior andbrain activity), while <strong>PLS</strong>R (Wold, 1982; Martens and Naes, 1989; de⁎ Corresponding authors.E-mail addresses: herve@utdallas.edu (H. Abdi), rmcintosh@rotman-baycrest.on.ca(A.R. McIntosh).Jong and Phatak, 1997; Tenenhaus, 1998; Martens and Martens, 2001;Wold et al., 2001; Abdi, 2010) is a regression technique that predictsone set of data from another (e.g., predicts behavior from brainactivity). A third, closely related, technique called partial least squarespath modeling (see, e.g., Esposito-Vinzi et al., 2010 <strong>for</strong> a recentcomprehensive review) can be seen as a least squares equivalent ofstructural equation modeling (which is a maximum likelihoodtechnique). Despite <strong>PLS</strong> path modeling's obvious relevance, thismethod has not yet been applied to <strong>neuroimaging</strong>, and there<strong>for</strong>e wewill not include it in this review. For both <strong>PLS</strong>C and <strong>PLS</strong>R, statisticalinferences are implemented using computational cross-validation<strong>methods</strong> (e.g., jackknife, bootstrap). As a distinct advantage, <strong>PLS</strong>techniques are tailored to handle the very large data sets which aretypical of current <strong>neuroimaging</strong> research.In this paper we will present <strong>PLS</strong>C, <strong>PLS</strong>R and their main variantsused in <strong>neuroimaging</strong>. We introduce each technique with a smallartificial example in order to describe the main computational steps.For each technique we also present and review major applications1053-8119/$ – see front matter © 2010 Elsevier Inc. All rights reserved.doi:10.1016/j.neuroimage.2010.07.034Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


2 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxFig. 1. The <strong>PLS</strong> family.from the <strong>neuroimaging</strong> literature. A diagram outlining the various <strong>PLS</strong><strong>methods</strong> is shown in Fig. 1.2. NotationsIn this section, we review the main notations used in this paper. Forconvenience, Appendix A also lists our main notations and acronyms(see also Abdi and Williams, 2010c, <strong>for</strong> more details on matrices).Data are stored in matrices which are denoted by upper case boldletters (e.g., X). The identity matrix is denoted I. Column vectors aredenoted by lower case bold letters (e.g., x). Matrix or vector transpositionis denoted by an uppercase superscript T (e.g., X T ). Two bold lettersplaced next to each other imply matrix or vector multiplication unlessotherwise mentioned. The number of rows, columns, or sub-matrices isdenoted by an uppercase italic letter (e.g.,I) and a given row, column, orsub-matrix is denoted by a lowercase italic letter (e.g., i).Brain activity is stored in an I by J matrix denoted X whose genericelement is denoted x i, j and where the rows are observations and thecolumns are variables. Matrix X is made up of N a priori sub-matrices,with I n being the number of observations in sub-matrix n. The sum ofthe number of observations in all of the sub-matrices is the number ofrows of X (i.e., I=∑I n ; see Fig. 2a). When dealing with spatiotemporal<strong>neuroimaging</strong> <strong>methods</strong> (e.g., EEG, fMRI, NIRS), there are Tscans where the set of scans <strong>for</strong> all I observations at time t correspondsto an I by J t matrix denoted X t . The X t matrices are concatenated byrow to <strong>for</strong>m the larger matrix X (whose total number of columns J isthe sum of all the J t ; see Fig. 3).Fig. 2. Data representation <strong>for</strong> matrices (a) X and (b) Y. Note that the I observations of X and Y are composed of N sub-matrices, X 1 …X n…X N and Y 1 …Y n…Y N representing the groupsor trial types.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx3Fig. 3. Data Matrix <strong>for</strong> EEG, MEG, fMRI, and NIRS experiments: The data (scans) X <strong>for</strong> each time point t has I rows and J columns. Each scan from X 1 to X T is concatenated by rows sothat each row contains all the voxel activity at all time points.Behavioral or design data are stored in an I by K matrix, Y, whosegeneric element is denoted y i, k .EachY matrix is made up of the same Napriori sub-matrices as X (Fig. 2b), with K behavioral measures or contraststhat code <strong>for</strong> different aspects of the experimental design (e.g., categorymembership). In general, matrices X and Y are statistically preprocessedin order to make the variables comparable. For example, the mean of eachcolumn can be subtracted from all its elements or each column can betrans<strong>for</strong>med into Z-scores. In some cases a normalization is per<strong>for</strong>med ona sub-matrix basis (e.g., the mean of each column of each sub-matrix iszero and its standard deviation is one).2.1. The main tool: the singular value decompositionThe main analytical tool <strong>for</strong> <strong>PLS</strong> is the singular value decomposition(SVD) of a matrix (see Abdi, 2007; de Leeuw, 2007; Greenacre,1984; Takane, 2002 <strong>for</strong> details and <strong>tutorial</strong>s). Recall that the SVD of agiven J×K matrix Z decomposes it into three matrices as:Z = UΔV T L= ∑δ l u l v T lð1Þlwhere U is the J by L matrix of the normalized left singular vectors(with L being the rank of Z), V the K by L matrix of the normalizedright singular vectors, Δ the L by L diagonal matrix of the L singularvalues, also δ l , u l , and v l are respectively the lth singular value, left,and right singular vectors. Matrices U and V are orthonormal matrices(i.e., U T U = V T V = I). The SVD is closely related to and generalizesthe well-known eigen-decomposition as U is also the matrix of thenormalized eigenvectors of ZZ T , V is the matrix of the normalizedeigenvectors of Z T Z, and the singular values are the square root of theeigenvalues of ZZ T and Z T Z (these two matrices have the sameeigenvalues). Key property: the SVD provides the best reconstitution(in a least squares sense) of the original matrix by a matrix with alower rank.3. <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Correlation<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Correlation (<strong>PLS</strong>C) analyzes the relationshipbetween the matrices X and Y. These matrices store measurementscollected on the same observations (e.g., scans or participants). The I by Jmatrix X corresponds to brain activity and the I by K matrix Y correspondsto behavioral or design variables. The relationship betweenthe jth column of X and the kth column of Y is measured by the dot (i.e.,scalar) product between these two columns. When these two columnsare centered, the dot product gives the covariance between these twocolumns. When, in addition, these two columns are normalized (e.g., areZ-scores, or when the sum of the squared values of each column is equalto one), the dot product expresses the correlation between these twocolumns. Because covariance and correlation are not directional (i.e.,correlation and covariance do not depend upon the order of thevariables), the roles of X and Y are symmetric and the analysis focuseson shared in<strong>for</strong>mation.3.1. Overview of <strong>PLS</strong>C<strong>PLS</strong>C can be subdivided into four variants: (1) behavior <strong>PLS</strong>C, (2) task<strong>PLS</strong>C, (3) seed <strong>PLS</strong>C, and (4) multi-table or multi-block <strong>PLS</strong>C (McIntoshand Lobaugh, 2004; McIntosh et al., 1998). The difference between thetechniques rests in the Y matrix. Y is a matrix of behavioral variables inbehavior <strong>PLS</strong>C, a matrix of contrasts or design variables in task <strong>PLS</strong>C, amatrix of voxel activity from the regions of interest (ROIs) in seed <strong>PLS</strong>C,and there are multiple Y matrices in multi-table <strong>PLS</strong>C, each consistingof behavioral, design, or ROI variables. Note that <strong>for</strong> all versions of <strong>PLS</strong>C,all scans are co-registered onto a common brain space to remove differencesin brain volume (e.g., Talairach coordinates; see Talairach andTournoux, 1988).3.2. Formal expression of <strong>PLS</strong>CFormally, the relationship between the columns of X and Y arestored in a cross-product matrix, denoted R, which is computed as:R = Y T X:In general, R is a matrix of correlations because X and Y arecentered and normalized (e.g., expressed as Z-scores).The SVD [see Eq. (1)] ofR decomposes it into three matrices:R = UΔV T :In the <strong>PLS</strong>C vocabulary the singular vectors U and V are also calledsaliences (Bookstein, 1994) and in this paper these two terms aresynonymous. The L left singular vectors of R (i.e., U) represent thedesign or behavioral profiles that best characterize R, whereas the Lright singular vectors of R (i.e., V) represent the voxels or brain imagesthat best characterize R.To express the saliences relative to brain activity and behavior (ordesign), the original matrices X and Y are projected onto theirrespective saliences. This creates latent variables—which are linearcombinations of the original variables—that are computed as:L X = XV;where the matrix of latent variables of X (i.e., the I by L matrix L X )iscalled “brain scores,” andL Y = YU;where the matrix of latent variables of Y (i.e., the I by L matrix L Y )iscalled “behavior” or “design scores.” A pair of vectors l X;l (which isthe lth column of L X ) and l Y;l (which is the lth column of L Y )reflects a relationship between brain activity and behavior (Nestoret al., 2002). <strong>PLS</strong>C searches <strong>for</strong> latent variables that express the largestamount of in<strong>for</strong>mation common to both X and Y. Specifically, <strong>PLS</strong>Ccomputes latent variables with maximal covariance.ð2Þð3Þð4Þð5ÞPlease cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx53.5. Behavior <strong>PLS</strong>CBehavior <strong>PLS</strong>C analyzes the relationship between the behavioralcharacteristics of groups and their functional brain activity. Matrix Xcontains voxel activity [Eq. (11)] and Matrix Y, in this case, containsvarious demographic (e.g., age) and/or behavioral data (e.g., neuropsychologicaltests, reaction times).For our example, (fictitious) participants underwent behavioraltesting using a memory test <strong>for</strong> word recall. The behavioral measureswere the number of words correctly recalled and the average reactiontime (in ms). The behavioral data are:2Y behavior = 4Y behavior;1Y behavior;2Y behavior;32 315 60019 52018 545322 4265 =21 404; ð12Þ23 4116 29 32674 30 309 530 303where the rows are the same participants as in matrix X and thecolumns are the participants’ number of words recalled and reactiontime scores, respectively. Note that both X and Y contain in<strong>for</strong>mationfrom the same participants and hence have the same numberof rows but are likely to have a different number of columns.Both X and Y are centered and normalized within each conditionn (i.e., each X n and Y n is centered and normalized independently,and the sum of squares of a column in one condition is equal to 1,note also then when all values are equal to their mean, they areall normalized to zero). This normalization gives the followingmatrices:23−0:77 0:07 0:00 −0:59 0:71 −0:41 0:41 0:00 −0:79 −0:41 0:23 0:410:15 −0:74 −0:71 0:78 0:00 0:82 −0:82 0:71 0:57 0:82 0:57 −0:820:62 0:67 0:71 −0:20 −0:71 −0:41 0:41 −0:71 0:23 −0:41 −0:79 0:410:71 −0:41 0:27 0:29 0:00 −0:66 0:48 −0:31 −0:31 −0:59 0:23 −0:71X =0:00 −0:41 0:53 0:51 0:00 0:75 0:33 −0:50 0:81 0:78 −0:79 0:71−0:71 0:82 −0:80 −0:81 0:00 −0:09 −0:81 0:81 −0:50 −0:20 0:57 0:000:71 −0:41 0:62 −0:82 0:00 0:41 −0:41 −0:71 −0:15 −0:15 −0:51 −0:82674 0:00 −0:41 0:15 0:41 0:71 0:41 −0:41 0:00 −0:62 0:77 −0:29 0:41 5−0:71 0:82 −0:77 0:41 −0:71 −0:82 0:82 0:71 0:77 −0:62 0:81 0:41andð13Þ23−0:79 0:780:57 −0:600:23 −0:170:00 0:78Y behavior =−0:71 −0:61: ð14Þ0:71 −0:176 −0:82 0:7974 0:41 −0:22 50:41 −0:57The matrix of correlations <strong>for</strong> each condition n is then computed as(Fig. 4):R behavior;n = Y T behavior;nX n :ð15ÞAll the condition-wise matrices of correlations are stacked one ontop of the other to <strong>for</strong>m the combined matrix of correlations R behavior ,which is the input <strong>for</strong> the SVD. The R behavior matrix contains theFig. 4. Matrix X and matrix Y <strong>for</strong> Behavior <strong>PLS</strong>C: The observations are arranged according to N conditions in both matrices and are normalized within condition. The matrix of correlations(R n) between each condition-wise sub-matrix (X n and Y n) are stacked one below the other to <strong>for</strong>m a combined matrix of correlations R which is then decomposed by SVD.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


6 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxcorrelation of each of the J voxels in X with each of the K behavioralmeasures in Y within each of the N conditions. There<strong>for</strong>e R will haveN×K rows and J columns. For our example, the matrix of correlationsis:2 3R behavior;1R behavior = 4 R behavior;25R behavior;3230:84 −0:32 −0:24 0:87 −0:72 0:69 −0:69 0:24 1:00 0:69 −0:04 −0:69−0:80 0:38 0:31 −0:90 0:67 −0:74 0:74 −0:31 −1:00 −0:74 −0:03 0:74−0:50 0:87 −0:94 −0:93 0:00 −0:60 −0:81 0:92 −0:92 −0:69 0:96 −0:50=0:67 −0:21 0:02 0:05 0:00 −0:95 0:30 −0:07 −0:65 −0:90 0:56 −0:98674 −0:87 0:50 −0:76 1:00 0:00 −0:50 0:50 0:87 0:19 0:19 0:63 1:00 50:96 −0:70 0:90 −0:97 0:25 0:70 −0:70 −0:96 −0:43 0:06 −0:80 −0:97ð16Þwhere each of the N×K rows gives the kth behavioral measure <strong>for</strong> thenth condition and where the J columns represent the voxel activity.The SVD [cf., Eq. (1)] ofR behavior is computed as:R behavior = UΔV T ;which gives:ð17Þ230:41 −0:42 0:37 −0:02 −0:14 −0:71−0:41 0:44 −0:40 0:00 −0:14 −0:68−0:43 0:25 0:74 0:46 0:01 −0:02U =−0:07 0:31 0:40 −0:86 0:08 −0:01; ð18Þ674 −0:44 −0:47 −0:06 −0:09 0:74 −0:15 50:53 0:51 0:00 0:20 0:64 −0:13where U is the N×K row and L column matrix of the saliences <strong>for</strong> thebehavioral measures, where L is the rank of R behavior ,233:80 0 0 0 0 00 3:25 0 0 0 00 0 2:46 0 0 0Δ =0 0 0 1:64 0 0; ð19Þ674 0 0 0 0 0:33 0 50 0 0 0 0 0:08where Δ is the diagonal matrix of singular values, and230:46 0:09 0:23 −0:34 0:07 −0:42−0:32 −0:04 0:10 0:24 −0:28 −0:540:26 0:25 −0:35 −0:12 −0:03 −0:140:04 −0:59 −0:02 −0:47 0:37 −0:18−0:12 0:22 −0:22 0:04 0:49 0:300:39 −0:14 −0:09 0:43 0:03 0:26V =−0:22 −0:03 −0:43 −0:49 −0:24 0:24; ð20Þ−0:28 −0:28 0:33 0:13 0:13 0:050:25 −0:49 −0:07 0:01 −0:57 0:266 0:24 −0:34 −0:13 0:27 0:35 −0:0474 −0:30 −0:09 0:36 −0:15 0:05 0:32 5−0:33 −0:24 −0:56 0:21 0:08 −0:31where V is the J×L matrix of the saliences <strong>for</strong> brain activity.Behavior saliences (i.e., U) indicate task-dependent differences inthe brain-behavior correlation (i.e., the interaction of the experimentalconditions with the behavioral measures). Brain saliences (i.e., V)indicate voxel-dependent differences in the brain behaviorcorrelation.The correlation between the brain scores and each of thebehavioral measures gives a pattern of scores similar to the behaviorsaliences (U), depicting experimental differences in behavior. Confidenceintervals on these correlations can be used to assess thereliability of the brain-behavior relationship (McIntosh and Lobaugh,2004).A separate bar plot of each column of U against each behavioralmeasure shows how the experimental conditions (i.e., groups)interact with the behavioral measures. Fig. 5 shows that the firstbehavior salience differentiates between the word recall per<strong>for</strong>manceof the AD group from the PD and NC group and differentiates betweenthe reaction times of the AD and NC groups. The second behaviorsalience differentiates the word recall per<strong>for</strong>mance of the PD groupfrom the other two groups, but does not differentiate between thereaction times <strong>for</strong> all three groups.From the saliences, we compute the latent variables [see Eqs. (4)and (5)]. Recall that the X latent variables, L X , are called brain scoresand the Y latent variables, L Y , are called behavior scores. Both L X and L Yhave I rows and L columns.Fig. 5. Mini-example—Behavior <strong>PLS</strong>C: Behavior saliences 1 and 2. (a) The first behavior salience reveals the difference between AD and the other groups <strong>for</strong> words recalled and thedifference between AD and NC <strong>for</strong> reaction time; (b) The second behavior salience reveals the difference between PD and the other groups <strong>for</strong> words recalled but no difference inreaction times <strong>for</strong> all groups.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx7The latent variables <strong>for</strong> brain activity (brain scores) are computedas (cf., Eq. (4)):23−1:23 0:90 −0:49 0:13 0:30 0:350:90 −1:31 1:21 0:31 0:78 0:900:33 0:41 −0:72 −0:44 −1:08 −1:250:21 0:64 0:35 −1:41 0:02 0:00L X = XV =1:05 −0:89 −1:46 0:17 −0:01 0:00: ð21Þ−1:25 0:25 1:10 1:24 −0:01 0:006 1:38 1:24 0:12 0:12 −0:20 0:0174 0:34 −0:11 −0:51 0:42 1:36 −0:06 5−1:73 −1:13 0:39 −0:55 −1:16 0:05The computation of the behavior scores is slightly more complicateddue to the block structure of the data. Recall that R behavior has N×K rowswhere each row depicts the kth behavioral measure <strong>for</strong> the nthcondition. In our example, the first two rows of R behavior represent thebehavioral measures (i.e., words recalled and reaction time) <strong>for</strong> AD, thenext two rows of R behavior represent the behavioral measures <strong>for</strong> PD andthe last two rows of R behavior represent the behavioral measures <strong>for</strong> NC.The product the rows of Y and U corresponding to each experimentalcondition separately gives the latent variables <strong>for</strong> behavior (behaviorscores) per condition. These condition-wise behavior scores are stackedone of top of the other to <strong>for</strong>m the combined behavior scores. So, withthe latent variable <strong>for</strong> the n-th condition computed as:L Y ; n = Y n U n ;we obtain2L Y = 4L Y;1L Y;2L Y;3ð22Þ23−0:64 0:67 −0:61 0:01 0:00 0:030:48 −0:50 0:45 −0:01 0:01 0:010:16 −0:17 0:15 0:00 −0:01 −0:043−0:05 0:24 0:31 −0:67 0:06 0:005 =0:34 −0:36 −0:76 0:19 −0:06 0:02; ð23Þ−0:29 0:12 0:45 0:47 0:00 −0:026 0:78 0:79 0:05 0:23 −0:10 0:0374 −0:30 −0:30 −0:02 −0:08 0:16 −0:04 5−0:48 −0:48 −0:02 −0:15 −0:06 0:01The latent variables <strong>for</strong> brain activity (brain scores or L X ) andbehavior (behavior scores or L Y ) are not typically illustrated. HoweverPCA style plots show the effect of two latent variables at once. Inour example, the PCA style plot of L X (Fig. 6a) shows that the firstbrain score separates AD 1 ,PD 3 and NC 3 from the other participants,while the second brain score separates AD 2 ,PD 2 ,NC 2 and NC 3 fromthe other participants. The PCA style plot of L Y (Fig. 6b) shows thatthe first behavior score separates NC 1 from NC 2 and NC 3 , while thesecond behavior score separates AD 1 ,PD 2 and PD 3 from AD 2 ,AD 3and PD 2 .Because of the small sample size, the bootstrap and permutationtests were not per<strong>for</strong>med in this analysis and, hence, these resultsrepresent a fixed effect model (i.e., the interpretation is limited to thegiven data sets and cannot be generalized to the population).3.5.1. ApplicationsBehavior <strong>PLS</strong>C has been used in several <strong>neuroimaging</strong> domains toexamine the relationship between brain activity and behavior. Forexample, Price et al. (2004) used behavior <strong>PLS</strong>C to examine therelationship between voxel activity from PET scans of individuals withAlzheimer's disease and normal controls and descriptive and cognitivemeasures of age, white matter volume, and mental status on theMini-Mental State Examination (MMSE; Folstein et al., 1975). Thesaliences <strong>for</strong> Y behavior showed that tracer retention differencesbetween patients with Alzheimer's disease and normal controls (i.e.,brain activity differences between groups coded by contrasts)contributed most to the correlation. The MMSE scores had the secondhighest contribution.Behavior <strong>PLS</strong>C has also been used (1) to identify grey mattersystems correlated with response inhibition in obsessive compulsivedisorder (Menzies et al., 2007), (2) to examine relationships betweenbrain structures and neuropsychological tests in schizophrenia(Nestor et al., 2002), (3) to analyze the relationship between cerebralblood flow (as measured by SPECT) and visuo-spatial tasks inAlzheimer's patients (Tippett and Black, 2008), and, (4) to relatebrain shape, neuropsychological behavior, and cortical functioning intraumatic brain injury patients (Bookstein et al., 2002; Fujiwara et al.,2008).Fig. 6. Mini-example—Behavior <strong>PLS</strong>C: Brain and Behavior scores. (a) Plot of L X ; latent variables 1 and 2 of X (Brain scores) and (b) Plot of L Y ; latent variables 1 and 2 of Y (Behaviorscores). The dimensions represent the latent variables, the horizontal (resp. vertical) coordinate of each point is given by its value <strong>for</strong> the first (resp. second) latent variable.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


8 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx3.6. Task <strong>PLS</strong>CIn contrast with behavior <strong>PLS</strong>C, task <strong>PLS</strong>C analyzes the relationshipbetween brain activity and experimental design. There are two closelyrelated types of task <strong>PLS</strong>C: (1) contrast, and (2) mean-centered. Whendifferences in scaling are considered, the contrast and mean-centeredapproaches give almost the same results (McIntosh and Lobaugh,2004). The mean-centered approach is currently the version of task<strong>PLS</strong>C used, and we present contrast task <strong>PLS</strong>C essentially <strong>for</strong> historicalreasons and to be exhaustive.3.6.1. Contrast task <strong>PLS</strong>CContrast task <strong>PLS</strong>C analyzes the relationship between brainmeasurements and a set of contrasts which represent specificexperimental questions. As with behavior <strong>PLS</strong>C, when using contrasttask <strong>PLS</strong>C, matrix X [Eq. (11)] contains voxel activity. However,instead of demographic or behavioral measures, matrix Y contains aset of orthonormal contrasts (i.e., these contrasts are pairwiseorthogonal and the sum of squares of the coefficients of a givencontrast is equal to one), which reflects the experimental question athand. The method <strong>for</strong> contrast task <strong>PLS</strong>C is illustrated in Fig. 7.The matrix X is obtained from the data described in Eq. (11). Thesedata are then centered and normalized such that the mean of eachcolumn is zero and the sum of the squared elements of each column isequal to one. This gives:230:31 0:14 0:02 −0:41 0:33 −0:42 0:15 0:20 −0:27 −0:35 0:19 −0:33−0:07 −0:32 −0:19 0:51 0:23 0:34 −0:42 0:47 0:34 −0:02 0:30 −0:490:05 0:48 0:24 −0:14 0:13 −0:42 0:15 −0:08 0:19 −0:35 −0:17 −0:33−0:19 −0:09 0:24 0:25 −0:46 −0:42 0:50 −0:35 −0:27 −0:35 0:19 −0:17X =−0:31 −0:09 0:46 0:38 −0:46 0:21 0:38 −0:48 0:64 0:41 −0:53 0:14:−0:44 0:37 −0:63 −0:41 −0:46 −0:17 −0:54 0:47 −0:42 −0:13 0:42 −0:020:55 −0:43 0:24 −0:41 0:23 0:34 −0:19 −0:35 −0:12 0:19 −0:41 0:30674 0:42 −0:43 0:02 0:12 0:33 0:34 −0:19 −0:08 −0:27 0:63 −0:29 0:45 50:30 0:37 −0:41 0:12 0:13 0:21 0:15 0:20 0:19 −0:02 0:30 0:45ð24ÞBecause there are three experimental groups in our example, themaximum number of orthonormal contrasts is two (see e.g., Abdi etal., 2009b), and this matrix of contrasts, denoted Y contrast , is:23−0:24 −0:41−0:24 −0:41−0:24 −0:41−0:24 0:41Y contrast =−0:24 0:41; ð25Þ−0:24 0:416 0:47 0:0074 0:47 0:00 50:47 0:00where the rows represent participants AD 1 ,AD 2 ,AD 3 ,PD 1 ,PD 2 ,PD 3 ,NC 1 ,NC 2 and NC 3 , and the columns represent contrasts ψ 1 and ψ 2 .Contrast ψ 1 predicts that the AD and PD participants will differ fromthe NC participants and contrast ψ 2 predicts that the AD participantswill differ from the PD participants.The matrix of correlations between X and Y contrast (i.e., R contrast )iscomputed as:R contrast = Y T contrastX:For our example, the cross-product matrix is:ð26Þ0:90 −0:35 −0:10 −0:12 0:49 0:63 −0:16 −0:16 −0:14 0:57 −0:28 0:85R contrast = ;−0:25 −0:05 0:00 0:11 −0:85 0:05 0:19 −0:39 −0:12 0:27 −0:10 0:45ð27Þwhich is decomposed by the SVD as [cf., Eq. (1)]:R contrast = UΔV T :with the left singular vectorsU =ð28Þ−1:0000 0:0014; ð29Þ−0:0014 −1:0000the singular valuesΔ = 1:67 00 1:13; ð30Þand the right singular vectors23−0:54 0:220:21 0:040:06 0:000:07 −0:09−0:29 0:75−0:38 −0:05V =0:10 −0:17: ð31Þ0:10 0:340:09 0:116 −0:34 −0:2474 0:17 0:09 5−0:51 −0:39The brain scores (latent variables <strong>for</strong> brain activity) are computedas [cf., Eq. (4)]:230:56 0:480:14 0:570:48 0:330:64 −0:46L X = XV =0:01 −0:82; ð32Þ0:52 −0:116 −0:94 0:0074 −1:08 −0:08 5−0:34 0:08Fig. 7. Matrix X and matrix Y contrast <strong>for</strong> Contrast approach to task <strong>PLS</strong>C: The matrix of correlations (R) between X and Y contrast is decomposed by SVD.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx9and the design scores (latent variables <strong>for</strong> the design) are computedas [cf., Eq. (5)]:230:24 0:410:24 0:410:24 0:410:24 −0:41L Y = YU =0:24 −0:41: ð33Þ0:24 −0:416 −0:47 0:0074 −0:47 0:00 5−0:47 0:00For spatio-temporal data, the matrix X is made up of T matrices, eachdenoted X t . The product of each of these X t matrices with the brainsaliences gives a matrix of brain scores <strong>for</strong> a given time point, called thetemporal scores, which depict differences in the time-course of brainactivation.Incidentally, contrast task <strong>PLS</strong>C is related to Principal ComponentAnalysis (PCA) because, like PCA, contrast task <strong>PLS</strong>C boils down tocomputing the SVD of a correlation (or covariance) matrix. There<strong>for</strong>e,the plots <strong>for</strong> contrast task <strong>PLS</strong>C are similar to PCA plots, except that <strong>for</strong>the contrast approach there are separate plots <strong>for</strong> the brain scores(L X ) and <strong>for</strong> the design scores (L Y ). Also, as in PCA, the number oflatent variables is limited to the rank of R contrast . For our example, therank of R contrast is two and, hence, there are only two brain or designlatent variables.Even though, in general, <strong>PLS</strong>C results are not illustrated with PCAstyle graphs, such a representation can be in<strong>for</strong>mative because it canhelp integrate the effects of two latent variables at once. As anillustration, PCA style plots of the latent variables, L X and L Y , <strong>for</strong> ourexample are shown in Fig. 8a and b, respectively and indicate that thethree groups are separated in the space of the first two latentvariables. Note that L X is mean-centered and the first brain score isplotted against the second brain score. The first brain score separatesNC from AD and PD. The second brain score separates AD from the PDgroup (Fig. 8a). Note that because L Y is a product of a set of contrastsand brain activity, each observation in an experimental condition isweighted equally. There<strong>for</strong>e a plot of each latent variable <strong>for</strong> thedesign (design score) shows a condition effect (i.e., the scores within acondition fall on the same point in the space). For our example, thefirst design score separates NC from AD and PD, and the second designscore separates AD from PD (Fig. 8b).3.6.2. Mean-centered task <strong>PLS</strong>CMean-centered task <strong>PLS</strong>C is used when the I observations arestructured into N groups or conditions. When using mean-centeredtask <strong>PLS</strong>C, the observations in X are arranged according to these Nexperimental conditions and Y is a matrix of dummy coding thatcodes <strong>for</strong> the experimental groups or conditions. For our example, thematrix X is the data matrix described in Eq. (11) and Y dummy coding is:2 31 0 01 0 01 0 00 1 0Y dummy coding =0 1 0; ð34Þ0 1 06 0 0 174 0 0 150 0 1where the first column represents participants in the AD group, thesecond column represents participants in the PD group, and the thirdcolumn represents participants in the NC group.The average <strong>for</strong> each condition is then computed and stored in anN×J matrix, denoted M, which is computed as:noM = diag 1 T −1Y TY dummy coding dummy codingXð35Þwhere 1 is a column vector of nones with the same o number of rows asmatrix Y dummy coding and diag 1 T Y dummy coding is the diagonal matrixof the sum of the columns of Y dummy coding . For our example, M is:23:67 4:67 6:00 4:00 8:00 3:00 5:33 6:00 4:33 2:00 6:3332:67M = 4 2:00 4:33 6:00 4:67 1:00 3:33 6:67 3:67 3:67 4:00 5:67 5:00 5:8:00 2:33 5:67 3:67 8:00 6:67 5:00 4:00 3:33 6:67 4:33 7:67ð36ÞFig. 8. Mini-example—Task <strong>PLS</strong>C: (a) Plot of L X ; latent variables 1 and 2 of X (Brain scores) and (b) Plot of L Y ; latent variables 1 and 2 of Y (Design scores). The dimensions representthe latent variables, the horizontal (resp. vertical) coordinate of each point is given by its value <strong>for</strong> the first (resp. second) latent variable.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


10 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxEach column of matrix M is then mean-centered by subtractingthe mean of the column from each value of that column. This columnwisemean-centered matrix is denoted R mean centered (note thatR mean centered is not a matrix of correlations). Formally, R mean centeredis computed as:R meancentered = M−1 1 N 1T Mwhere 1 is an N by 1 vector of ones. For our example, R meanis:R mean centeredð37Þcentered2−0:89 0:89 0:11 −0:11 2:33 −1:33 −0:33 1:44 0:56 −2:22 0:893−2:44= 4 −2:56 0:56 0:11 0:56 −4:67 −1:00 1:00 −0:89 −0:11 −0:22 0:22 −0:11 5:3:44 −1:44 −0:22 −0:44 2:33 2:33 −0:67 −0:56 −0:44 2:44 −1:11 2:56The SVD [cf., Eq. (1)] onR meancentered gives:ð38ÞR mean centered = UΔV T ð39Þwith left singular vectors230:20 0:79U = 4 0:59 −0:57 5; ð40Þ−0:79 −0:22singular valuesΔ = 7:86 00 5:73; ð41Þand right singular vectors23−0:56 0:000:21 0:120:03 0:010:08 −0:05−0:52 0:69−0:34 −0:18V =0:13 −0:12: ð42Þ0:03 0:310:05 0:116 −0:32 −0:3874 0:15 0:14 5−0:33 −0:43Because mean-centered task <strong>PLS</strong>C differentiates between experimentalgroups or conditions, the saliences <strong>for</strong> the groups/conditions Uhave a pattern similar to the design latent variables (L Y ) in thecontrast approach of task <strong>PLS</strong>C. The first salience differentiates the PDgroup from the NC group and the second salience differentiates the ADgroup from the NC and PD groups (Fig. 9). Note that in mean-centeredtask <strong>PLS</strong>C the group saliences optimally separate the groups. Thismakes mean-centered task <strong>PLS</strong>C akin to discriminant analysis.Specifically mean-centered task <strong>PLS</strong>C is equivalent to “barycentricdiscriminant analysis,” because these two techniques compute theSVD of the matrix R mean centered (see, e.g., Abdi and Williams, 2010a;Williams et al., in press, <strong>for</strong> details).3.6.3. Applications3.6.3.1. Contrast task <strong>PLS</strong>C. Contrast task <strong>PLS</strong>C has been used to extractboth the spatial and temporal features characteristic of fMRI data(Addis et al., 2004). For example, Keightley et al. (2003b) usedcontrast task <strong>PLS</strong>C to study the relationship between attentiondemandingcognitive tasks and brain activity associated withemotional processing.In addition, many PET studies have used contrast task <strong>PLS</strong>C tostudy the relationship between memory and different experimentalconditions (e.g., Iidaka et al., 2000; Nyberg et al., 1996; Nyberg et al.,2002; Rajah and McIntosh, 2005; Köhler et al., 1998). For example,(Nyberg et al., 2002) used contrast task <strong>PLS</strong>C to study the similaritiesand differences in brain activity <strong>for</strong> working memory, semanticmemory, and episodic memory. Their design matrix containeddummy coding <strong>for</strong> different tasks. Contrast task <strong>PLS</strong>C revealed spatialpatterns of brain activity that showed optimal association with theexperimental conditions.3.6.3.2. Mean-centered task <strong>PLS</strong>C. Mean-centered task <strong>PLS</strong>C is now usedinstead of contrast task <strong>PLS</strong>C. McIntosh and Lobaugh (2004)demonstrated that the two approaches produce identical resultssave <strong>for</strong> differences in the singular values. The mean-centeredapproach has been used in several ERP studies (Bergström et al.,2007; West and Krompinger, 2005; West and Wymbs, 2004) andrecent fMRI work (McIntosh et al., 2004; Martinez-Montes et al.,2004).3.6.3.3. “Non-rotated” <strong>PLS</strong>C. A recent version of <strong>PLS</strong>C serves as avariation of the contrast task <strong>PLS</strong>C and mean-centered task <strong>PLS</strong>C. Thisversion uses apriori(preferably orthogonal) contrasts, but, insteadof using matrix X, itusesthematrixR mean centered [see Eq. (37)], anddoes not per<strong>for</strong>m the SVD of R mean centered (hence the term “nonrotated”because the SVD can be interpreted as a rotation of anorthogonal basis). Instead the matrix obtained by the product of thecontrast matrix with the R mean centered matrix is used directly <strong>for</strong>inferential analyses. This version acts as a complement to meancenteredtask <strong>PLS</strong>C because it can test specific contrasts and there<strong>for</strong>ecan be used to decompose interaction terms into interpretabledimensions (see e.g., Protzner and McIntosh, 2007; McIntosh et al.,2004).3.7. Seed <strong>PLS</strong>CSeed <strong>PLS</strong>C is used to analyze functional connectivity (McIntoshet al., 1997; McIntosh and Gonzalez-Lima, 1991). Although seed <strong>PLS</strong>Ccan analyze connectivity over the whole brain, this method has beencommonly used to determine the connectivity between one (or a few)region(s) of interest (ROI) and the rest of the brain.The first step is to determine a seed, which is a representativevoxel or set of voxels <strong>for</strong> a particular ROI. Ideally, a seed is chosen <strong>for</strong>theoretical reasons, but voxel seeds are also selected using behavioror task <strong>PLS</strong>C, by univariate multiple regression, or based on experimentalhypotheses. One seed is selected <strong>for</strong> every ROI <strong>for</strong> each experimentalcondition. The activity of these seed voxels are stored in Y seed(McIntosh et al., 1997). Because Y seed already accounts <strong>for</strong> seedactivity, the seed voxels are removed from X.In our example (<strong>for</strong> simplicity), we have chosen the first columnand last column of X to be the seeds [cf., Eq. (11)]. There<strong>for</strong>e, Y seed isrepresented as:2 32 34 25 33 4Y seed =2 6; ð43Þ1 56 9 774 8 857 8where the rows represent the participants from matrix X and thecolumns represent the seed voxels. Because we have used 2 vectors ofPlease cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx11Fig. 9. Mini-example—Mean-centered task <strong>PLS</strong>C: Plot of saliences 1 and 2 of R mean−centered .X as seeds, X seed now has I rows and J−2 columns [compare with X inEq. (11)]:235 6 1 9 1 7 6 2 1 71 5 8 8 7 2 8 6 4 88 7 3 7 1 7 4 5 1 43 7 6 1 1 10 2 2 1 7X seed =3 8 7 1 6 9 1 8 8 1: ð44Þ7 3 1 1 3 1 8 1 3 90 7 1 8 7 4 2 3 6 2674 0 6 5 9 7 4 4 2 10 357 4 5 7 6 7 6 5 4 8As with behavior <strong>PLS</strong>C, both X seed and Y seed are centered andnormalized within each condition n. This gives the following values<strong>for</strong> X:230:07 0:00 −0:59 0:71 −0:41 0:41 0:00 −0:79 −0:41 0:23−0:74 −0:71 0:78 0:00 0:82 −0:82 0:71 0:57 0:82 0:570:67 0:71 −0:20 −0:71 −0:41 0:41 −0:71 0:23 −0:41 −0:79−0:41 0:27 0:29 0:00 −0:66 0:48 −0:31 −0:31 −0:59 0:23X seed =−0:41 0:53 0:51 0:00 0:75 0:33 −0:50 0:81 0:78 −0:790:82 −0:80 −0:81 0:00 −0:09 −0:81 0:81 −0:50 −0:20 0:57−0:41 0:62 −0:82 0:00 0:41 −0:41 −0:71 −0:15 −0:15 −0:51674 −0:41 0:15 0:41 0:71 0:41 −0:41 0:00 −0:62 0:77 −0:29 50:82 −0:77 0:41 −0:71 −0:82 0:82 0:71 0:77 −0:62 0:81and Y:ð45Þ23−0:77 0:410:15 −0:820:62 0:410:71 −0:71Y seed =0:00 0:71: ð46Þ−0:71 0:006 0:71 −0:8274 0:00 0:41 5−0:71 0:41The matrix of correlations <strong>for</strong> each condition n is then computed as[cf., Eq. (15)]:R seed;n = Y T seed;nX seed;n :ð47ÞAll the condition-wise matrices of correlations are stacked one ontop of the other to <strong>for</strong>m the combined matrix of correlations R seed ,which is the input <strong>for</strong> the SVD. The matrix R seed contains thecorrelation of each of the J−2 voxels in X seed with each of the K seedvoxels in Y seed within each of the N conditions. There<strong>for</strong>e R seed willhave N×K rows and J−2 columns (Fig. 10). For our example, the R seedmatrix of correlations is:2R seed = 4R seed;1R seed;2R seed;335230:25 0:33 0:45 −0:98 0:19 −0:19 −0:33 0:84 0:19 −0:580:90 0:87 −0:96 0:00 −1:00 1:00 −0:87 −0:69 −1:00 −0:69−0:87 0:76 0:78 0:00 −0:40 0:91 −0:79 0:13 −0:28 −0:24=0:00 0:19 0:16 0:00 0:99 −0:10 −0:13 0:79 0:97 −0:72;674 −0:87 0:98 −0:87 0:50 0:87 −0:87 −1:00 −0:65 0:33 −0:93 50:50 −0:76 1:00 0:00 −0:50 0:50 0:87 0:19 0:19 0:63ð48Þwhere the rows are the seed voxels per condition n and the columnsrepresent the voxel activity.The remaining steps to compute seed <strong>PLS</strong>C are identical to those ofbehavior <strong>PLS</strong>C. However, there is an important difference ininterpretation. The relationship between the seed voxels (i.e., Y seed )and the voxels in X seed represents their functional connectivity.Specifically, <strong>for</strong> our example the SVD [cf., Eq. (1)] of R seed iscomputed as:R seed = UΔV T ;which gives:ð49Þ23−0:03 −0:17 0:60 0:47 0:55 −0:29−0:42 0:76 0:08 0:40 −0:26 −0:08−0:17 0:18 0:67 −0:69 −0:05 0:08U =−0:10 −0:48 0:35 0:32 −0:70 0:21; ð50Þ674 −0:70 −0:34 −0:21 −0:19 −0:04 −0:56 50:54 0:12 0:09 −0:07 −0:37 −0:74Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


12 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxFig. 10. Matrix X and matrix Y <strong>for</strong> Seed <strong>PLS</strong>C: The I observations are arranged according to N conditions in both matrices and are normalized within condition. The matrix of correlations(R n) between each condition-wise sub-matrix (X n and Y n) are stacked one below the other to <strong>for</strong>m a combined matrix of correlations R which is then decomposed by SVD.where U are the saliences <strong>for</strong> the seed voxels,233:29 0 0 0 0 00 2:88 0 0 0 00 0 2:03 0 0 0Δ =0 0 0 1:60 0 0; ð51Þ674 0 0 0 0 0:9 0 50 0 0 0 0 0:4where Δ is the diagonal matrix of singular values, and230:20 0:29 −0:06 0:75 −0:23 −0:26−0:49 0:08 0:28 −0:05 0:02 −0:140:42 −0:12 0:52 −0:35 0:02 −0:53−0:10 0:00 −0:34 −0:35 −0:61 0:01−0:15 −0:59 −0:06 0:09 −0:18 0:24V =0:10 0:47 0:38 −0:14 −0:55 0:35; ð52Þ0:51 −0:09 −0:27 0:08 −0:11 −0:026 0:22 −0:27 0:48 0:24 0:04 0:5674 0:07 −0:49 0:06 0:07 −0:42 −0:27 50:43 0:09 −0:28 −0:30 0:21 0:27where V are the saliences <strong>for</strong> the brain activity. The relationshipbetween the seed voxels (i.e., Y seed ) and the voxels in X seed representstheir functional connectivity. This pattern of connectivity can beillustrated by plotting the saliences <strong>for</strong> the brain (V) into a glass brainto show how strongly the seed voxels correlate with the rest of thebrain.The N×K rows and L columns (with L being the rank of matrixR seed ) matrix of the seed saliences, (i.e., U), indicate differences in theseed voxels across experimental conditions. A separate bar plot ofeach column of U against each seed voxel shows how the experimentalconditions interact with the seed voxels. Fig. 11a shows thefirst seed salience where the first seed voxel is important <strong>for</strong> the NCgroup and the second seed voxel differentiates the AD group from theNC group. Fig. 11b shows the second seed salience where the first seedvoxel differentiates the PD group from the other groups and thesecond seed voxel differentiates the AD group from the PD group.3.7.1. ApplicationsSeveral PET studies have used seed <strong>PLS</strong>C to explore functionalconnectivity (e.g., see Della-Maggiore et al., 2000; Grady et al., 2003;Keightley et al., 2003a; Nyberg et al., 2000). For example, Keightleyet al. (2003a) used seed <strong>PLS</strong>C to determine networks of brain regionsactive during neutral and sad internally generated mood states inindividuals with positive and negative affect styles. They first usedcontrast task <strong>PLS</strong>C to identify the subgenual cingulate as an ROI, avoxel of which was selected as the seed. The Y matrix consisted ofactivity in this voxel across all time points. Via seed <strong>PLS</strong>C, Keightleyet al. (2003a) found a divergent subgenual cingulate-mediated networkthat differentiated participants based on affect style.3.8. Multi-Table <strong>PLS</strong>CMulti-table <strong>PLS</strong>C, also known as multi-block <strong>PLS</strong>C, extendsbehavior, contrast, and seed <strong>PLS</strong>C to analyze three or more matricessimultaneously. This means that <strong>for</strong> every matrix X of brain activity,there are two or more Y matrices (Fig. 12). The X matrix is designatedas causal and the Y matrices are designated as related data sets. Multitable<strong>PLS</strong>C analyses the pattern(s) of correlation between causal andrelated data sets (Streissguth et al., 1993). That is, multi-table <strong>PLS</strong>Cidentifies the distributed patterns of brain activity and measuresPlease cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx13Fig. 11. Mini-example—Seed <strong>PLS</strong>C: Plot of saliences 1 and 2 of R seed .related changes in the behavioral measures, contrasts, and/or seedvoxel activity across experimental conditions (Caplan et al., 2007).In our example, X contains brain activity [as given in Eq. (24)], Y 1contains the contrasts from Eq. (25) and Y 2 contains the seed voxelactivity from Eq. (43). The matrix of correlations R 1 is computedbetween brain activity and the contrasts as Y T 1X as described in Eq.(27). This gives0:90 −0:35 −0:10 −0:12 0:49 0:63 −0:16 −0:16 −0:14 0:57 −0:28 0:85R 1 = :−0:25 −0:05 0:00 0:11 −0:85 0:05 0:19 −0:39 −0:12 0:27 −0:10 0:45ð53ÞA second matrix of correlations, R 2 , is computed between brainactivity and seed voxel activity as Y T 2X as described in Eq. (48). Theresulting R 2 matrix has N×K rows and J columns. Recall that in seed<strong>PLS</strong>C the seed voxels are removed from X. However in multi-table<strong>PLS</strong>C, the seed voxels are not removed so that R 1 and R 2 have the samenumber of columns (i.e., J 1 =J 2 ). For the example, R 2 is:231:00 0:25 0:33 0:45 −0:98 0:19 −0:19 −0:33 0:84 0:19 −0:58 −0:19−0:19 0:90 0:87 −0:96 0:00 −1:00 1:00 −0:87 −0:69 −1:00 −0:69 1:001:00 −0:87 0:76 0:78 0:00 −0:40 0:91 −0:79 0:13 −0:28 −0:24 −0:50R 2 =−0:50 0:00 0:19 0:16 0:00 0:99 −0:10 −0:13 0:79 0:97 −0:72 1:00;674 1:00 −0:87 0:98 −0:87 0:50 0:87 −0:87 −1:00 −0:65 0:33 −0:93 −0:87 5−0:87 0:50 −0:76 1:00 0:00 −0:50 0:50 0:87 0:19 0:19 0:63 1:00ð54ÞR 1 and R 2 are concatenated by column to <strong>for</strong>m matrix R multi :R multi = R 1R 2ð55Þ230:90 −0:35 −0:10 −0:12 0:49 0:63 −0:16 −0:16 −0:14 0:57 −0:28 0:85−0:25 −0:05 0:00 0:11 −0:85 0:05 0:19 −0:39 −0:12 0:27 −0:10 0:457 77777777775 1:00 0:25 0:33 0:45 −0:98 0:19 −0:19 −0:33 0:84 0:19 −0:58 −0:19=−0:19 0:90 0:87 −0:96 0:00 −1:00 1:00 −0:87 −0:69 −1:00 −0:69 1:00 :1:00 −0:87 0:76 0:78 0:00 −0:40 0:91 −0:79 0:13 −0:28 −0:24 −0:506 −0:50 0:00 0:19 0:16 0:00 0:99 −0:10 −0:13 0:79 0:97 −0:72 1:004 1:00 −0:87 0:98 −0:87 0:50 0:87 −0:87 −1:00 −0:65 0:33 −0:93 −0:87−0:87 0:50 −0:76 1:00 0:00 −0:50 0:50 0:87 0:19 0:19 0:63 1:00The SVD of R multi gives:R multi = UΔV T ;ð56Þð57Þwhere U is the set of saliences <strong>for</strong> both the contrasts and the seedvoxel activity and V is the set of saliences <strong>for</strong> the voxels. ConsequentlyU has both a task or design portion and a seed portion. The task ordesign portion of U indicates how the pattern of brain activity variesacross conditions. The seed portion of U indicates how the pattern ofbrain activity varies with the seed voxel activity across experimentalconditions (Caplan et al., 2007).Fig. 12. Matrix X, Y 1 to Y M <strong>for</strong> multi-table <strong>PLS</strong>C: The condition wise matrix of correlations between X and each of the Y m matrices are stacked one below the other to <strong>for</strong>m a combinedmatrix of correlations R which is then decomposed by SVD.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


14 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxIn our example, the first two rows of U are denoted U 1 and theycorrespond to the contrast matrix. The last six rows of U are denotedU 2 and they correspond to the 2 seed voxels <strong>for</strong> each of the 3experimental groups:U = U 1U 22−0:17 0:18 0:37 0:01 0:57 0:63 0:20 −0:200:04 −0:04 0:22 −0:22 −0:18 −0:25 0:90 −0:07−0:19 0:13 0:08 −0:63 −0:53 0:45 −0:12 0:22=−0:01 −0:90 0:41 0:01 −0:03 0:09 −0:11 0:07−0:29 −0:19 −0:31 −0:65 0:52 −0:30 −0:06 −0:076 0:01 0:32 0:73 −0:17 0:02 −0:46 −0:34 −0:114 −0:73 0:07 0:08 0:29 0:04 −0:15 0:09 0:590:57 0:07 0:07 −0:15 0:32 0:03 0:05 0:73The latent variable <strong>for</strong> the contrasts are computed by:3:75ð58ÞL Y1= Y 1 U 1 : ð59ÞFor our example, this gives230:02 −0:03 −0:18 0:09 −0:06 −0:05 −0:41 0:070:02 −0:03 −0:18 0:09 −0:06 −0:05 −0:41 0:070:02 −0:03 −0:18 0:09 −0:06 −0:05 −0:41 0:070:06 −0:06 0:00 −0:09 −0:21 −0:25 0:32 0:02L Y1=0:06 −0:06 0:00 −0:09 −0:21 −0:25 0:32 0:02:0:06 −0:06 0:00 −0:09 −0:21 −0:25 0:32 0:02−0:08 0:08 0:17 0:00 0:27 0:30 0:09 −0:09674 −0:08 0:08 0:17 0:00 0:27 0:30 0:09 −0:09 5−0:08 0:08 0:17 0:00 0:27 0:30 0:09 −0:09ð60ÞNote that, as a consequence of contrast coding (which treats allobservations of a group similarly), the rows of a given group have thesame value. There<strong>for</strong>e the plot of the latent variables <strong>for</strong> the contrastswill show only the three group values (Fig. 13a) and shows that — asindicated by the first contrast — the first latent variable separates thecontrols group from the two clinical groups but that, among the clinicalgroups, the PD group shows the largest difference with the NC group.Latent variables could be similarly computed <strong>for</strong> the seed portion ofmatrix U (i.e., U 2 ). However, instead when dealing with connectivitydata, saliences are, in general, plotted directly as shown in Fig. 13b. Inthis figure, the plot of the saliences <strong>for</strong> the first seed [with values of−0.19, −0.29 and −0.73, see Eq. (58)] shows that seed 1 is important<strong>for</strong> all experimental conditions, although to a varying degree. The plot ofthe saliences <strong>for</strong> the second seed (with values of −0.01, 0.01 and 0.57)shows that, in contrast, seed 2 is only relevant <strong>for</strong> the NC group andcould be an indicator of differences between NC and the clinicalpopulations.Matrix V contains the saliences <strong>for</strong> brain activity across experimentalconditions and seed voxels. For our example, V is:23−0:48 0:04 −0:16 −0:32 0:19 0:76 0:02 −0:050:30 −0:23 0:20 0:04 −0:43 0:36 −0:29 0:28−0:37 −0:27 0:12 −0:15 −0:04 −0:29 −0:25 0:280:24 0:27 −0:21 −0:55 0:26 −0:13 −0:01 0:53−0:08 0:01 −0:02 0:42 0:62 0:01 −0:58 0:10−0:24 0:48 0:31 0:12 −0:02 −0:10 0:00 −0:25V =0:18 −0:40 −0:01 −0:36 0:35 −0:21 −0:03 −0:37:0:40 0:27 −0:17 0:18 0:00 0:22 −0:24 0:107 0:11 0:30 0:08 −0:42 −0:23 −0:04 −0:58 −0:44 77775 6 −0:04 0:47 0:30 −0:04 0:10 −0:11 0:20 0:224 0:33 0:10 −0:39 0:13 0:14 0:07 0:24 −0:280:32 −0:12 0:71 −0:09 0:34 0:26 0:17 −0:04ð61ÞEach salience in V shows a distributed pattern of brain activityacross conditions (this pattern can be visualized on a “glass brain”).This gives an indication of the interplay of the experimental conditionand the functional connectivity between the ROIs represented by theseed voxels.3.8.1. ApplicationsIn <strong>neuroimaging</strong>, multi-table <strong>PLS</strong>C is often used when one of thedata sets consists of seed voxel activity. For example, Vallesi et al.(2009) used multi-table <strong>PLS</strong>C in an fMRI experiment to find afunctionally connected network of brain regions related to withholdinga response. With fMRI activity as the causal data set (i.e., X), theauthors used experimental design and seed voxel activity as the tworelated data sets. A multi-table <strong>PLS</strong>C on the combined matrix ofcorrelations showed a distributed network related to the experimentalgroups and their functional connectivity with the regionsrepresented by the seed voxels (Vallesi et al., 2009).Fig. 13. Mini-example—Multi-table <strong>PLS</strong>C: (a) First design score of the task portion of multi-table <strong>PLS</strong>C <strong>for</strong> AD, PD, and NC groups; and (b) First salience of the seed portion of multitable<strong>PLS</strong>C <strong>for</strong> the AD, PD, and NC groups.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx15In another study, Caplan et al. (2007) used multi-table <strong>PLS</strong>C toidentify the functional networks involved in resolving proactiveinterference. The causal matrix was brain activity measured by fMRI.The first related matrix was the experimental design variables andthe second related matrix consisted of a combination of behavioralmeasures (reaction times) and seed voxel activity. The multi-table<strong>PLS</strong>C generated voxel saliences (i.e., V-s) that indicated the degree towhich each voxel was related to the experimental design, behaviorvariables, and seed regions. Multi-table <strong>PLS</strong>C also generated (1)design latent variables showing changes in brain activity acrossconditions, and (2) seed saliences characterizing functional connectivityof the brain with the seed voxels (Caplan et al., 2007).4. <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Regression (<strong>PLS</strong>R)So far, we have seen that <strong>PLS</strong>C correlates brain and behavior (ordesign) variables by determining brain activity that co-occurs withdifferent experimental conditions. Because <strong>PLS</strong>C is correlational, itcannot explicitly predict a set of variables from another set. Bycontrast, if we want to predict behavior (or some features of theexperimental design) from brain activity, we use <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong>Regression (<strong>PLS</strong>R; Wold, 1982; Wold et al., 2001, see also Sidtis et al.,2003 <strong>for</strong> a discussion). The predictors (i.e., the independent variables)are stored in X and the predicted variables (i.e., the dependentvariables) are stored in Y.<strong>PLS</strong>R finds latent variables stored in a matrix T that model X andsimultaneously predict Y (i.e., T plays a role similar to L X ). Formallythis is expressed as a double decomposition of X and the predicted Ŷ:X = TP TandŶ = TBCT ;ð62Þwhere P and C are loadings (or weights) and B is a diagonal matrix.These latent variables are ordered according to the amount of varianceof Ŷ that they explain. Rewriting Eq. (62) shows that Ŷ can also beexpressed as a regression model as:regression (Note that matrix B is diagonal, but that matrix B <strong>PLS</strong> is ingeneral not diagonal).In <strong>neuroimaging</strong>, each latent variable of X describes a uniqueaspect of the overall variance of brain activity that best predictsbehavior or experimental design. Similarly, each latent variable of Ydescribes a unique aspect of the variance of behavior or experimentaldesign that is best predicted by X (Fig. 14). In our example, X containsbrain activity from Eq. (11) and Y contains behavioral measures fromEq. (12).4.1. Iterative computation of the latent variables in <strong>PLS</strong>RIn <strong>PLS</strong>R, the latent variables are computed by iterative applicationsof the SVD. Each run of the SVD produces orthogonal latent variables<strong>for</strong> X and Y and corresponding regression weights (see, e.g., Abdi,2010, <strong>for</strong> more details and alternative algorithms).4.1.1. Step oneX and Y are mean-centered and normalized (or trans<strong>for</strong>med intoZ-scores) and stored in matrices X 0 and Y 0 . The matrix of correlations(or covariance) between X 0 and Y 0 is computed as:R 1 = X T 0 Y 0 :ð65ÞThe SVD is then per<strong>for</strong>med on R 1 and produces two sets of orthogonalsingular vectors W 1 and C 1 , and the corresponding singularvalues Δ 1 [compare with Eq. (1)]:R 1 = W 1 Δ 1 C T 1 :ð66ÞThe first pair of singular vectors (i.e., the first columns of W 1 andC 1 ) are denoted w 1 and c 1 and the first singular value (i.e., the firstdiagonal entry of Δ 1 ) is denoted δ 1 . The singular value represents themaximum covariance between the singular vectors. The first latentvariable of X is given by [compare with Eq. (4) defining L X ]:Ŷ = TBCT = XB <strong>PLS</strong>ð63Þt 1 = X 0 w 1ð67ÞwithB <strong>PLS</strong> = P T+ BC T ;ð64Þwhere, t 1 is normalized such that t T 1 t 1 = 1. For our example, matricesX and Y [from Eqs. (11) and (12)] were trans<strong>for</strong>med into Z-scores andthe first set of weights w 1 [obtained from the SVD of R 1 , see Eq. (66)]gives:(where P T+ is the Moore-Penrose pseudo-inverse of P T , see, e.g., Abdiand Williams, 2010c, <strong>for</strong> definitions). The matrix B <strong>PLS</strong> has J rows andK columns and is equivalent to the regression weights of multiplew 1 = ½ −0:43 0:20 0:10 −0:03 −0:00 −0:41 0:09 0:16 0:07 −0:41 0:16 −0:59 Š T ;ð68ÞFig. 14. <strong>PLS</strong>R: The latent variables of X and Y are computed iteratively and stored in T and U respectively. The regression weights from each iteration are stored in B pls and are used topredict Y from X (modified from Wold et al., 2001).Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


16 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxwhich then gives t 1 equal to:t 1 = ½0:41 0:11 0:33 0:28 −0:15 0:22 −0:45 −0:57 −0:19 Š T :ð69ÞThe loadings of X 0 on t 1 (i.e., the projection of X 0 onto the spaceof t 1 ) are given by:p 1 = X T 0t 1 :ð70ÞThe least square estimate of X from the first latent variable isgiven by:ˆX 1 = t T 1p 1 :The first latent variable of Y is obtained asu 1 = Y 0 c 1 :For our example u 1 is:ð71Þð72Þu 1 = ½2:16 1:12 1:41 0:12 0:11 −0:10 −1:43 −1:67 −1:71 Š T :ð73ÞSo far, <strong>PLS</strong>R works like <strong>PLS</strong>C with t 1 and u 1 playing the roles of l X;1and l Y;1 , respectively. The step specific to <strong>PLS</strong>R consists in theprediction of Y from the “X-latent variable” t 1 . This is obtained byfirst reconstituting Y from its latent variable as:Ŷ1 = u 1 cT 1 ;and then rewriting Eq. (74) as:Ŷ1 = t 1 b 1 cT 1withb 1 = t T 1u 1 :ð74Þð75Þð76ÞThe scalar b 1 is the slope of the regression of Ŷ1 on t 1 (recallthat because Y and X are centered the regression equation requiresonly the slope and so there is no intercept in the equation). Eq. (75)shows that Ŷ1 is obtained as a linear regression from the latentvariable extracted from X 0 . The regression weight <strong>for</strong> the example isb 1 =3.39.Matrices ˆX 1 and Ŷ1 are then subtracted from the original X 0 andoriginal Y 0 respectively to give deflated X 1 and Y 1 :X 1 = X 0 −ˆX 1 ;andY 1 = Y 0 − Ŷ1:ð77Þð78Þ4.1.2. Next stepThe first set of latent variables has now been extracted. MatricesX 1 and Y 1 now become the input matrices <strong>for</strong> the next iteration andplay the roles of X 0 and Y 0 , respectively [cf., Eq. (65)]. From the SVD ofmatrix R 2 = X T 1 Y 1, we get w 2 , c 2 , t 2 and b 2 and the new deflatedmatrices X 2 and Y 2 .4.1.3. Last stepThe iterative process continues until X is completely decomposedinto L components (where L is the rank of X). When this is done, theweights (i.e., all the w l 's) <strong>for</strong> X are stored in the J by L matrix W(whose lth column is w l ). For our example W is:23−0:43 −0:04 −0:01 −0:71 −0:04 0:25 −0:23 −0:180:20 −0:36 −0:04 0:09 −0:28 −0:05 −0:67 −0:270:10 0:45 −0:28 −0:03 −0:17 0:22 0:09 −0:05−0:03 0:12 −0:34 −0:25 0:37 −0:53 0:06 −0:530:00 0:40 0:50 −0:30 −0:50 −0:32 0:00 0:02−0:41 0:14 0:09 0:01 0:32 −0:13 −0:09 0:45W =0:09 −0:04 −0:50 −0:07 −0:49 −0:39 0:19 0:25:0:16 −0:09 0:47 0:05 0:10 −0:33 −0:09 −0:080:07 0:35 −0:20 0:03 0:19 −0:24 −0:62 0:376 −0:41 0:20 0:10 0:48 −0:06 −0:22 0:05 −0:3674 0:16 −0:41 0:09 −0:27 0:19 −0:31 0:23 0:24 5−0:59 −0:36 −0:12 0:15 −0:27 −0:15 −0:03 0:12ð79ÞThe latent variables of X are stored in matrix T. Because T showshow the brain activity relates to each of the observations, T has I rowsand L columns. For our example T is:230:41 0:10 0:29 0:06 −0:51 −0:29 0:39 0:350:11 0:33 0:50 −0:11 0:71 −0:07 0:00 0:040:33 0:24 0:00 −0:21 −0:31 0:09 −0:57 −0:500:28 −0:03 −0:55 −0:35 0:19 0:33 0:48 −0:08T =−0:15 0:41 −0:52 0:57 0:09 −0:24 −0:14 0:16:0:22 −0:65 0:13 0:54 0:13 0:25 −0:10 −0:09−0:45 0:09 0:14 −0:12 −0:22 0:61 −0:16 0:45674 −0:57 −0:03 0:17 0:05 −0:15 −0:16 0:40 −0:58 5−0:19 −0:47 −0:17 −0:43 0:07 −0:53 −0:29 0:24ð80ÞThe weights <strong>for</strong> Y are stored in C. Because C weights the variablesof Y, C has K rows and L columns. For our example C is:C =−0:71 −0:73 −0:37 −0:97 0:34 0:71 −0:81 −0:93:0:70 0:69 0:93 0:24 −0:94 −0:71 0:59 −0:36ð81ÞThe latent variables of Y are stored in matrix U. Because U showshow the behavioral variables relate to each of the observations, U has Irows and L columns. For our example U is:232:16 0:76 0:60 0:28 −0:30 −0:15 0:07 0:031:12 0:76 0:22 −0:18 0:22 −0:02 0:00 0:001:41 0:28 −0:06 −0:19 −0:02 0:12 −0:10 −0:040:12 −0:81 −0:75 −0:28 0:10 0:04 0:06 −0:01U =0:11 0:61 −0:27 0:38 −0:02 −0:06 −0:01 0:01:−0:10 −0:86 0:19 0:22 0:12 0:09 −0:02 −0:01−1:43 0:10 −0:02 −0:18 0:11 0:19 0:00 0:03674 −1:67 0:25 0:31 0:11 −0:14 −0:08 0:03 −0:04 5−1:71 −1:08 −0:23 −0:16 −0:08 −0:12 −0:03 0:02ð82ÞThe loadings <strong>for</strong> X are stored in matrix P. Because P is theprojection of X onto the space of the latent variables T and describesthe J voxels as explained by the latent variables, P has J rows and Lcolumns. For our example P is:23−2:17 0:06 0:51 −1:56 −0:53 0:31 −0:45 −0:181:76 −1:26 −0:47 0:00 −0:67 −0:78 −1:40 −0:27−0:34 2:28 −1:24 −0:13 −0:87 0:56 0:22 −0:05−0:40 1:15 −0:86 −0:39 2:04 −1:04 0:30 −0:54−0:78 0:62 2:01 −1:28 −0:93 −0:68 0:00 0:02−2:40 0:29 0:61 0:23 1:12 −0:37 −0:34 0:46P =0:56 0:90 −2:24 −0:79 −0:80 −0:65 0:34 0:26:0:98 −1:30 1:98 0:10 0:87 −0:79 −0:17 −0:08−0:29 1:66 −0:68 0:17 0:98 −1:15 −1:50 0:386 −2:53 0:44 −0:05 1:00 0:24 −0:42 0:22 −0:3774 1:61 −1:69 0:79 −0:60 1:06 −0:44 0:43 0:25 5−2:39 −1:11 −0:79 0:05 −0:55 −0:35 −0:11 0:12ð83ÞPlease cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx17The regression weights are stored in a diagonal matrix B. Theregression weights are used to predict Y from X, there<strong>for</strong>e, there is oneb l <strong>for</strong> every pair of t l and u l , and so B is an L×L diagonal matrix. Forour example, B is:233:39 0 0 0 0 0 0 00 1:74 0 0 0 0 0 00 0 0:95 0 0 0 0 00 0 0 0:61 0 0 0 0B =0 0 0 0 0:34 0 0 0: ð84Þ0 0 0 0 0 0:30 0 0674 0 0 0 0 0 0 0:14 0 50 0 0 0 0 0 0 0:08The predicted Y scores (in the <strong>for</strong>m of Z-scores) are now given by:Ŷ = TBCT = XB <strong>PLS</strong>ð85Þwhere, B <strong>PLS</strong> = P T+ BC T , (where P T+ is the Moore–Penrose pseudoinverseof P T ). B <strong>PLS</strong> has J rows and K columns. For our example, B <strong>PLS</strong>is:230:58 −0:430:03 0:00−0:21 0:210:11 −0:08−0:26 0:400:17 −0:23B <strong>PLS</strong> =−0:06 0:02: ð86Þ−0:18 0:22−0:17 0:126 −0:01 −0:0274 0:11 −0:11 50:45 −0:49The latent variables of T and U provide an estimate of Y given by Ŷ.In general, Ŷ is not equal to Y (Abdi, 2010). However, in our example,the behavioral variables were perfectly predicted in the fixed effectmodel.4.2. What does <strong>PLS</strong>R optimize?Because <strong>PLS</strong>R is a predictive method, its goal is slightly differentfrom <strong>PLS</strong>C. Specifically, <strong>PLS</strong>R finds a series of L latent variables t l suchthat the covariance between t 1 and Y is maximal and such that t 1 isuncorrelated with t 2 which has maximal covariance with Y and so on<strong>for</strong> all L latent variables (see, e.g., de Jong, 1993; Burnham et al., 1996;Tenenhaus, 1998; Tenenhaus and Tenenhaus, in press; Abdi, 2010;Esposito-Vinzi et al., 2010, <strong>for</strong> proofs and developments). Formally,we seek a set of L linear trans<strong>for</strong>mations of X that satisfies [comparewith Eq. (6)]:t l = Xw l such that covðt l ; YÞ = max ð87Þ(where w l is the vector of the coefficients of the lth lineartrans<strong>for</strong>mation and cov is the covariance computed between t andall columns of Y) under the constraints thatt T l t l ′ = 0 when l≠l′ ð88Þandt T lt l =1:ð89Þ4.3. How good is the prediction?4.3.1. Fixed effect modelA common measure of the quality of prediction is the ResidualEstimated Sum of <strong>Squares</strong> (RESS), which is given by (Abdi, 2010):RESS = ∥Y− Ŷ∥2ð90Þwhere || || 2 is the square of the norm of a matrix (i.e., the sum ofsquares of all the elements of this matrix). The smaller the value ofRESS, the better the quality of prediction (Abdi, 2010; Abdi andWilliams, 2010d).4.3.2. Random effect modelThe per<strong>for</strong>mance of <strong>PLS</strong>R (with respect to inference to thepopulation) is done through cross-validation techniques such as the’‘leave-one-out” procedure (Wold et al., 2001, also called the jackknife;see also Abdi and Williams, 2010b). In this procedure, each observationis removed in turn from both X and Y, and a <strong>PLS</strong>R model is recomputed<strong>for</strong> each of the remaining observations. Then B <strong>PLS</strong> is used topredict the left-out observation of Y from its X values. The predictedobservations are stored in Ỹ. The quality of prediction is measured ina way similar to RESS, and is called the Predicted Residual EstimatedSum of <strong>Squares</strong> (PRESS). Formally PRESS is obtained as (Abdi, 2010):PRESS = ∥Y−Ỹ∥ 2 :ð91ÞThe smaller PRESS is, the better the prediction.An obvious problem with <strong>PLS</strong>R is to find the optimum number oflatent variables <strong>for</strong> best predicting Y. Because <strong>neuroimaging</strong> datatypically have more variables than observations (a problem knownas the N ≪ P problem), <strong>PLS</strong>R will over-fit the data (i.e., a very goodfixed effect model will do very poorly with new observations). Ifthere is over-fitting, the quality of prediction will first decrease andthen increase as more latent variables are used <strong>for</strong> the prediction ofnew observations. The number of latent variables at which PRESSincreases gives an indication of the optimum number of latentvariables to be retained (Abdi, 2010; Abdi and Williams, 2010d).The next step in the random effect model is to determine thestability of the voxels, (i.e., to decide which voxels reliably depictbrain-behavior relationships). For this step, standard errors andconfidence intervals of the corresponding <strong>PLS</strong>R parameters arederived directly from the data through resampling techniques suchas the bootstrap (Abdi, 2010; Abdi et al., 2009a).4.4. Plotting the latent variablesThe latent variables of X (i.e., T) can be plotted against each otherto show how they are related to the observations. Sometimes, theweights of Y (i.e., C) are also plotted along with T in order to see howthe latent variables of X relate to the Y variables. In our example, thefirst latent variable of X separates the NC group from the clinicalgroups, while the second latent variable separates participants NC 3and PD 3 from the other participants. The second latent variable alsodifferentiates between the words recalled and the reaction times. Thismay explain why NC 3 and PD 3 are different from the otherparticipants (Fig. 15a). This plot shows the difference betweenobservations based on the part of brain activity that predicts thebehavioral measures.The weights of X (i.e., W) can be plotted against each other inorder to show how X variables combine to <strong>for</strong>m the latent variables. In<strong>neuroimaging</strong>, this plot is not very in<strong>for</strong>mative because of the largenumber of voxels. Instead, the loadings of X (i.e., P) can be projectedon a glass brain to show which regions of the brain best predict thebehavior or experimental design.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


18 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxxFig. 15. Mini-example—<strong>PLS</strong>R: (a) Latent variables of X (i.e., T)—1 (horizontal) and 2 (vertical); and (b) Latent variables of Y (i.e., U)—1 (horizontal) and 2 (vertical).The latent variables of Y (i.e., U) can be plotted against each otheralong with the weights of Y (i.e., C) to see how the latent variables of Yrelate to the original Y variables. In our example, the first latentvariable of Y separates the NC from the clinical groups, while thesecond latent variable separates participants NC 3 ,PD 1 and PD 2 fromthe other participants. As seen with the plot of the latent variables of X(i.e., T), the second latent variable also separates the words recalledand reaction time (Fig. 15b).4.5. ApplicationsBecause it is an iterative procedure, computing latent variables with<strong>PLS</strong>R takes longer than with <strong>PLS</strong>C. Nevertheless, <strong>PLS</strong>R has been usedsuccessfully in domains such as chemometrics because it can analyzedata with strongly correlated, noisy, and numerous independent variablesand also simultaneously model several dependent variables (Woldet al., 2001). The predictive aspect of <strong>PLS</strong>R makes it a very useful tool tomodel relationships between large data matrices of brain activity andbehavior and to subsequently predict behavior from brain activity.4.5.1. Shape modelingOne application of <strong>PLS</strong>R in <strong>neuroimaging</strong> is the model-based segmentationof sub-cortical brain structures by Rao et al. (2008). Herethe goal is to use brain structures that are easy to segment in order tofacilitate the segmentation of brain structures that are hard to segment.Statistical shape modeling, as this procedure is called, uses <strong>PLS</strong>R to analyzethe shapes of structures in the brain and to describe variations betweengroups of participants, such as normal controls and clinical populations.4.5.2. Prediction<strong>PLS</strong>R has also been used to predict behavior from <strong>neuroimaging</strong>data. For example, Giessing et al. (2007) used <strong>PLS</strong>R to predictbehavioral effects of nicotine from neural activity. The predictorswere the contrast fMRI images between scans under nicotine andscans under placebo <strong>for</strong> each participant. The dependent variablewas the residual term from a linear regression model that predictedthe difference in reaction time to validly and invalidly cuedtargets in a visual discrimination task under the influence of nicotine,from the same difference in reaction time under placebo. <strong>PLS</strong>Rshowed that brain activity predicted reaction times of participantsduring the placebo condition.<strong>PLS</strong>R has also been used to predict long term functional outcome oftraumatic diffuse axonal injury using Diffusion Tensor Tractography(Wang et al., 2008). The authors used diffusion tensor imagingmeasurements to predict the outcome of brain injury assessed aftersix months with the Glasgow Outcome Scale–Extended. <strong>PLS</strong>R provedto be a reliable quantitative method to predict long term outcomes ofdiffuse axonal injury.4.5.3. Classification<strong>PLS</strong>R has also been applied as a Discriminant Analysis (<strong>PLS</strong>-DA)tool <strong>for</strong> classification experiments. To do this, <strong>PLS</strong>-DA uses a dummycode as Y [see, e.g., Eq. (34)] because the main idea is to classifyobservations into different groups. <strong>PLS</strong>-DA has been used to classifydifferent types of dementia (Gottfries et al., 1995). However, thepredictors in this case were not brain images but descriptor variablesobtained at the time of examination (e.g., Albumin ratio, confusionalsymptoms, gender, vascular disease). In another study, Lehmann et al.(2007) used <strong>PLS</strong>-DA to classify various stages of Alzheimer's diseaseusing EEG. The authors compared different types of classificationalgorithms and found that <strong>PLS</strong>-DA and machine learning techniquesshowed analogous per<strong>for</strong>mance.5. Software<strong>PLS</strong> <strong>methods</strong> necessitate sophisticated computations and there<strong>for</strong>etheir application depends on the availability of software.For <strong>neuroimaging</strong>, a special toolbox written in MATLAB (byMcIntosh, Chau, Lobaugh, & Chen) dedicated to <strong>PLS</strong>C and a <strong>tutorial</strong>are freely available from www.rotman-baycrest.on.ca:8080. Theseprograms constitute the standard <strong>for</strong> <strong>PLS</strong>C <strong>for</strong> <strong>neuroimaging</strong>.For <strong>PLS</strong>R there are several available choices. Interested readerscan download a set of MATLAB programs (with <strong>tutorial</strong>) from thesenior author's home page (www.utdallas.edu/~herve). In addition,a set of MATLAB scripts implementing the examples used in thispaper will also be made available from this home page. Also, a publicdomain set of MATLAB programs is available from the home page ofthe N-Way project (www.models.kvl.dk/source/nwaytoolbox/) alongwith <strong>tutorial</strong>s and examples. The statistic toolbox from MATLABincludes a function to per<strong>for</strong>m <strong>PLS</strong>R. The public domain program Rimplements <strong>PLS</strong>R through the package <strong>PLS</strong> (Mevik and Wehrens,2007). For <strong>neuroimaging</strong>, SPM, has recently (2002) integrated a <strong>PLS</strong>module (as part of the MM toolbox written by Ferath Kherif). ThePlease cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx19general purpose statistical packages SAS, SPSS, and XLSTAT (whichhas, by far the most extensive implementation of <strong>PLS</strong> <strong>methods</strong>) can bealso used to per<strong>for</strong>m <strong>PLS</strong>R. In chemistry and sensory-evaluation, twomain programs are used: the first one called SIMCA-P was developedoriginally by Wold (who also pioneered <strong>PLS</strong>R), the second one calledthe UNSCRAMBLER was first developed by Martens who was anotherpioneer in the field. And finally, a commercial MATLAB toolbox hasalso been developed by EIGENRESEARCH.6. Related <strong>methods</strong>A complete review of the connections between <strong>PLS</strong> and the otherstatistical <strong>methods</strong> is, clearly, out of the scope of an introductorypaper (see, however, Burnham et al., 1996; Tenenhaus, 1998;Tenenhaus and Tenenhaus, in press; Esposito-Vinzi et al., 2010, <strong>for</strong>an overview), but some directions are worth mentioning. <strong>PLS</strong>C usesthe SVD in order to analyze the in<strong>for</strong>mation common to two or moretables, and this makes it closely related to several other SVD (or eigendecomposition)techniques with similar goals. The closest techniqueis obviously inter-battery analysis (Tucker, 1958) which uses the sameSVD as <strong>PLS</strong>C but on non-structured matrices. Canonical correlationanalysis (also called simply canonical analysis, or canonical variateanalysis, see Gittins, 1985; Mardia et al., 1979) is also a related techniquethat seeks latent variables with largest correlation instead of<strong>PLS</strong>C's criterion of largest covariance. Under the assumptions of normality,analytical statistical tests are available <strong>for</strong> canonical correlationanalysis but cross-validation procedures analogous to <strong>PLS</strong>C could alsobe used.In addition, several multi-way techniques encompass as a particularcase data sets with two tables. The oldest and most well knowntechnique is multiple factor analysis which integrates different tablesinto a common PCA by normalizing each table with its first singularvalue (Abdi and Valentin, 2007a; Escofier and Pagès, 1990). A morerecent set of techniques is the STATIS family which uses a more sophisticatednormalizing scheme whose goal is to extract the common partof the data (see Abdi and Valentin, 2007b, <strong>for</strong>anintroduction).Thisapproach has been used to identify patterns in fMRI data (e.g., Abdi et al.,2009a; Kherif et al., 2003; Shinkareva et al., 2006; Shinkareva et al.,2008). Closely related techniques (which have not been used so far<strong>for</strong> <strong>neuroimaging</strong>) comprise common component analysis (Mazerolleset al., 2006) which seeks a set of factors common to a set of data tables,and co-inertia analysis which could be seen as a generalization ofTucker's (1958) inter-battery analysis (see e.g., Thioulouse et al., 2003;Dray et al., 2003; Chessel and Hanafi, 1996; Thioulouse et al., 2003, <strong>for</strong>recent developments). Also, mean-centered task <strong>PLS</strong>C is obviouslyrelated to <strong>PLS</strong>-DA because their goals are very similar. Mean-centeredtask <strong>PLS</strong>C is also closely related to barycentric discriminant analysis(Abdi and Wiliams, 2010a) because, as previously mentioned, these twotechniques compute the SVD of the same matrix.<strong>PLS</strong>R is strongly related to regression-like techniques which havebeen developed to cope with the multi-colinearity problem. Theseinclude principal component regression, ridge regression, redundancyanalysis (also know as PCA on instrumental variables Rao, 1964; vanden Wollenberg, 1977; Tyler, 1982), and continuum regression (Stoneand Brooks, 1990) which provides a general framework <strong>for</strong> thesetechniques.7. Conclusion<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> analyze data from multiplemodalities collected on the same observations. We have reviewed twoparticular <strong>PLS</strong> <strong>methods</strong>: <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Correlation or <strong>PLS</strong>Cand <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Regression or <strong>PLS</strong>R. Both <strong>PLS</strong>C and <strong>PLS</strong>R canbe used to study brain activity, behavior, and experimental design.<strong>PLS</strong>C is relational and analyzes the shared in<strong>for</strong>mation between twoor more sets of variables. In contrast, <strong>PLS</strong>R is directional and predictsa set of dependent variables from a set of independent variables orpredictors. <strong>PLS</strong>C is more commonly used in <strong>neuroimaging</strong> as comparedto <strong>PLS</strong>R.The two <strong>methods</strong> differ also in the type of data matrices that areanalyzed. Typically data matrices analyzed by <strong>PLS</strong>C have a complexand specific structure whereas the data matrices analyzed by <strong>PLS</strong>Rtend to be simple matrices. <strong>PLS</strong>C is also very versatile and has beenadapted to several different situations such as multi-block analysis.<strong>PLS</strong>R, being an iterative process, requires long computation timewhich can be prohibitive <strong>for</strong> some applications with very large datasets.The relationship between <strong>PLS</strong>C and <strong>PLS</strong>R are also explored inBurnham et al. (1996) and, recently Gidskehaug et al. (2004) proposedto integrate these two approaches into a new predictiveapproach called BRIDGE-<strong>PLS</strong>. In practice, the two techniques are likelyto give similar conclusions because the criterion they optimize arequite similar.AcknowledgmentsWe would like to thank two anonymous reviewers, Betty Edelman,Cheryl Grady, and Ashwati Krishnan <strong>for</strong> help and comments on previousversions of this paper.Appendix A. List of notationsAcronymsPCA<strong>PLS</strong>C<strong>PLS</strong>R<strong>PLS</strong>-DA<strong>PLS</strong>-PMSVDPrincipal components analysis<strong>Partial</strong> least squares correlation<strong>Partial</strong> least squares regression<strong>Partial</strong> least squares discriminant analysis<strong>Partial</strong> least squares path modelingSingular value decompositionGeneral Notations∥∥norm of a matrix1 I×1 vector of onesIidentity matrixIthe number of observations or rows of X and YI nthe number of observations X nJthe number of columns of XJ tthe number of columns of X tKthe number of columns of YN the number of apriorisub-matrices of X (or Y)Tthe number of scans in spatio-temporal<strong>neuroimaging</strong> <strong>methods</strong>, such as EEG, fMRI,and NIRSTtransposeXthe matrix of brain activity(or independent variables)X n submatrix of X (Eq. (11))X tan I ×J t matrix representing a single scanin spatio-temporal <strong>neuroimaging</strong>Ythe matrix of behavioral or design variables(or dependent variables)Y n submatrix of Y (Eq. (12))Singular Value Decomposition (SVD)Δdiagonal matrix of the singular values(Eq. (1))(singular values are akin to standard deviations)δ llth singular valueUleft singular vectors (or saliences) of Z. They (Eq. (1))represent the design or behavioral profilesthat best characterize R.u llth left singular vectorVright singular vectors (or saliences) of Z. Theyrepresent the images that best characterize R.v llth right singular vectorZ = UΔV T= ∑ L l δlulvT lsingular value decomposition of Z (Eq. (1))(continued on next page)Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


20 A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Correlation (<strong>PLS</strong>C)L Xbrain scores/latent variables in <strong>PLS</strong>C <strong>for</strong> brain (Eq. (4))activity; L X = XVl X;lThe lth column of L Xl X;l T l Y;lscalar product of the latent variables l X;l andl Y;l (equal to the corresponding singular(Eqs. (6)and (9))value δ l ); Expresses the covariance betweenthese latent variablesL Ybehavior or design scores/latent variables in (Eq. (5))<strong>PLS</strong>C <strong>for</strong> behavior or design variables; L Y = YUl Y;lThe lth column of L YMmatrix of means of each condition in meancentered(Eq. (35))<strong>PLS</strong>CRmatrix of correlations between the columns of Xand Y; R = Y T X = UΔV T (Eqs. (2)and (3))R behavior matrix of correlations between X and Y behavior (Eq. (15))R contrast matrix of correlations between X and Y contrast (Eq. (26))R mean centered matrix of the deviations of the groupshto theiri(Eq. (37))grand mean; R mean centered = M−1 1 N 1T MR multimatrix of correlations between X and multiple Y (Eq. (56))matrices in multi-table <strong>PLS</strong>CR seedmatrix of correlations between X seed and Y seedX seedX matrix with the vectors of Y seed removed <strong>for</strong> seed (Eq. (44))<strong>PLS</strong>CY behaviorY matrix of demographic and/or behavioral (Eq. (12))dataY contrastY matrix of orthonormal contrasts reflecting the (Eq. (25))experimental hypothesesY dummy coding Y matrix of dummy coding that codes <strong>for</strong>(Eq. (34))experimental groups or conditionsY seed Seed matrix taken from X <strong>for</strong> seed <strong>PLS</strong>C (Eq. (43))<strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> Regression (<strong>PLS</strong>R)Ba diagonal matrix of slopes of the predictions of Yfrom T(Eqs. (62)and (63))B <strong>PLS</strong> matrix of coefficients to predict Y from X; (Eq. (64))regression: B <strong>PLS</strong> = P T+ BC Tequivalent to the regression weights of multipleC factor loadings of Y (Eqs. (62)and (63))C l matrix of right singular vectors of R l (Eq. (66))c l lth weight vector of Y (Eq. (67))Δ l diagonal matrix of singular values (Eq. (66))P factor loadings of X (Eq. (62))P þ Moore-Penrose pseudo inverse of P (Eq. (64))p l lth factor loading of X on t; p l = X T l−1tl (Eq. (70))R 1matrix of correlations of X 0 and Y 0 ; the SVD isper<strong>for</strong>med on the R 1 matrix;(Eqs. (65)and (66))R 1 = X T 0 Y 0 = W 1 Δ 1 C T 1Tmatrix of latent variables that model X andsimultaneously predict Y; T T T = I(Eqs. (62)and (63))t l lth latent variable of X l−1 ; t l = X l−1 w l (Eq. (67))u l lth latent variable of Y l−1 ; u l = Y l−1 c l (Eq. (72))W l matrix of left singular vectors of R l (Eq. (66))w llth weight vector of XX matrix of brain activity; X = TP T (Eq. (62))X 0matrix of mean-centered and normalized X values (Eq. (65))used in the first iterationX lmatrix of deflated X values <strong>for</strong> the lth iterationˆX lmatrix of predicted X values <strong>for</strong> the lth iterationY matrix of behavior or design variables (Eq. (62))Ŷlthe matrix of predicted Y values <strong>for</strong> the lth (Eq. (63))iterationY 0centered and normalized Y values used in the first (Eq. (65))iterationY l matrix of deflated Y values <strong>for</strong> the lth iteration (Eq. (65))Quality of the <strong>PLS</strong>R modelRESSresidual estimated sum of squares; a measure of (Eq. (90))the quality of the prediction in a fixed effect model;the smaller the value of RESS the better the qualityof predictionPRESSresidual estimated sum of squares; a measure of (Eq. (91))the quality of the prediction in a random effectmodel; the smaller the value of PRESS the betterthe quality of predictionỸmatrix of predicted observations using thejackknife or “leave-one-out” procedureReferencesAbdi, H., 2007. Singular value decomposition (SVD) and generalized singular valuedecompositon (GSVD). In: Salkind, N. (Ed.), Encyclopedia of Measurement andStatistics. Sage, Thousand Oaks (CA), pp. 907–912.Abdi, H., 2010. <strong>Partial</strong> least square regression, projection on latent structure regression,<strong>PLS</strong>-Regression. Wiley Interdiscip. Rev. Comput. Stat. 2, 97–106.Abdi, H., Valentin, D., 2007a. Multiple factor analysis (MFA). In: Salkind, N. (Ed.),Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks (CA), pp. 657–663.Abdi, H., Valentin, D., 2007b. STATIS. In: Salkind, N. (Ed.), Encyclopedia of Measurementand Statistics. Sage, Thousand Oaks (CA), pp. 955–962.Abdi, H., Williams, L.J., 2010a. Barycentric discriminant analysis. In: Salkind, N.,Dougherty, D., Frey, B. (Eds.), Encyclopedia of Research Design. Sage, ThousandOaks (CA), pp. 64–75.Abdi, H., Williams, L.J., 2010b. The jackknife. In: Salkind, N., Dougherty, D., Frey, B.(Eds.), Encyclopedia of Research Design. Sage, Thousand Oaks (CA), pp. 655–660.Abdi, H., Williams, L.J., 2010c. Matrix algebra. In: Salkind, N., Dougherty, D., Frey, B.(Eds.), Encyclopedia of Research Design. Sage, Thousand Oaks (CA), pp. 761–776.Abdi, H., Williams, L.J., 2010d. Principal components analysis. Wiley Interdiscip. Rev.Comput. Stat. 2, 433–450.Abdi, H., Valentin, D., O'Toole, A.J., Edelman, B., 2005. DISTATIS: the analysis of multipledistance matrices. Proceedings of the IEEE Computer Society: InternationalConference on Computer Vision and Pattern Recognition, pp. 43–47.Abdi, H., Dunlop, J.P., Williams, L.J., 2009a. How to compute reliability estimates anddisplay confidence and tolerance intervals <strong>for</strong> pattern classifiers using theBootstrap and 3-way multidimensional scaling (DISTATIS). Neuroimage 45, 89–95.Abdi, H., Edelman, B., Valentin, D., Dowling, W.J., 2009b. Experimental design andanalysis <strong>for</strong> psychology. Ox<strong>for</strong>d University Press, Ox<strong>for</strong>d.Addis, D.R., McIntosh, A.R., Moscovitch, M., Crawley, A.P., McAndrews, M.P., 2004.Characterizing spatial and temporal features of autobiographical memory retrievalnetworks: a partial least squares approach. Neuroimage 23, 1460–1471.Bergström, Z.M., Velmans, M., deFockert, J., Richardson-Klavehn, A., 2007. ERP evidence <strong>for</strong>successful voluntary avoidance of conscious recollection. Brain Res. 1151, 119–133.Bookstein, F., 1982. The geometric meaning of soft modeling with some generalizations.In: Jöreskog, K., Wold, H. (Eds.), System under indirect observation, Volume 2.North-Holland, Amsterdam.Bookstein, F.L., 1994. <strong>Partial</strong> least squares: a dose–response model <strong>for</strong> measurement inthe behavioral and brain sciences. Psycoloquy 5.Bookstein, F.L., Steissguth, A.P., Sampson, P.D., Conner, P.D., Barr, H.M., 2002. Corpuscallosum shape and neuropsychological deficits in adult males with heavy fetalalcohol exposure. Neuroimage 15, 233–251.Burnham, A., Viveros, R., MacGregor, J., 1996. Frameworks <strong>for</strong> latent variablemultivariate regression. J. Chemometr. 10, 31–45.Caplan, J.B., McIntosh, A.R., De Rosa, E., 2007. Two distinct functional networks <strong>for</strong>successful resolution of proactive interference. Cereb. Cortex 17, 1650–1663.Chessel, D., Hanafi, M., 1996. Analyse de la co-inertie de K nuages de points. Rev. Stat.Appl. 44, 35–60.de Jong, S., 1993. SIM<strong>PLS</strong>: an alternative approach to partial least squares regression.Chemom. Intell. Lab. Syst. 18, 251–263.de Jong, S., Phatak, A., 1997. <strong>Partial</strong> least squares regression. Proceedings of the secondinternational workshop on recent advances in total least squares techniques and errorin-variablesmodeling. Society <strong>for</strong> Industrial and Applied Mathematics, pp. 25–36.de Leeuw, J., 2007. Derivatives of generalized eigen-systems with applications. Departmentof Statistics Papers, pp. 1–28.Della-Maggiore, V., Sekuler, A.B., Grady, C.L., Benett, P.J., Sekuler, R., McIntosh, A.R.,2000. Corticolimbic interactions associated with per<strong>for</strong>mance on a short-termmemory task are modified by age. J. Neurosci. 20, 8410–8416.Dray, S., Chessel, D., Thioulouse, J., 2003. Co-inertia analysis and the linking of ecologicaldata tables. Ecology 84, 3078–3089.Efron, B., Tibshirani, R., 1986. Bootstrap <strong>methods</strong> <strong>for</strong> standard errors, confidenceintervals, and other measures of statistical accuracy. Stat. Sci. 1, 54–77.Escofier, B., Pagès, J., 1990. Multiple factor analysis. Comput. Stat. Data Anal. 18, 120–140.Esposito-Vinzi, V., Wynne, W., Chin, W.W., Henseler, J., Wang, H. (Eds.), 2010.Handbook of <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong>: Concepts, Methods and Applications. SpringerVerlag, New York.Folstein, M.F., Folstein, S.E., McHugh, P.R., 1975. “Mini-Mental State”. A practicalmethod <strong>for</strong> grading the cognitive state of patients <strong>for</strong> the clinician. J. Psychiatr. Res.12, 189–198.Fujiwara, E., Schwartz, M.L., Gao, F., Black, S.E., Levine, B., 2008. Ventral frontal cortexfunctions and quantified MRI in traumatic brain injury. Neuropsychologia 46, 461–474.Gidskehaug, L., Stødkilde-Jørgensen, H., Martens, M., Martens, H., 2004. Bridge-<strong>PLS</strong>regression: two-block bilinear regression without deflation. J. Chemometr. 18, 208–215.Giessing, C., Fink, G., Rosler, F., Thiel, C., 2007. fMRI data predict individual differences ofbehavioral effects of nicotine: a partial least squares analysis. J. Cogn. Neurosci. 19,658–670.Gittins, R., 1985. Canonical Analysis. Springer-Verlag, New York.Gottfries, J., Blennow, K., Wallin, A., Gottfries, C.G., 1995. Diagnosis of dementias usingpartial least squares discriminant analysis. Dementia 6, 83–88.Grady, C.L., McIntosh, A.R., Beig, S., Keightley, M.L., Burian, H., Black, S.E., 2003. Evidencefrom functional <strong>neuroimaging</strong> of a compensatory prefrontal network in Alzheimer'sdisease. J. Neurosci. 23, 986–993.Greenacre, M., 1984. Theory and applications of correspondence analysis. AcademicPress, London.Iidaka, T., Anderson, N.D., Kapur, S., Cabeza, R., Craik, F.I.M., 2000. The effect of dividedattention on encoding and retrieval in episodic memory revealed by positronemission tomography. J. Cogn. Neurosci. 12, 267–280.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034


A. Krishnan et al. / NeuroImage xxx (2010) xxx–xxx21Keightley, M.L., Seminowicz, D.A., Bagby, R.M., Costa, P.T., Fossati, P., Mayberg, H.S., 2003a.Personality influences limbic–cortical interactions during sad mood induction.Neuroimage 20, 2031–2039.Keightley, M.L., Winocur, G., Graham, S.J., Mayberg, H.S., Hevenor, S.J., Grady, C.L.,2003b. An fMRI study investigating cognitive modulation of brain regionsassociated with emotional processing of visual stimuli. Neuropsychologia 41,585–596.Kherif, F., Poline, J.-B., Mériaux, S., Benali, H., Flandin, G., Brett, M., 2003. Group analysisin functional <strong>neuroimaging</strong>: selecting subjects using similarity measures. Neuroimage20, 2197–2208.Köhler, S., Moscovitch, M., Wincour, G., Houle, S., McIntosh, A.R., 1998. Networks ofdomain-specific and general regions involved in episodic memory <strong>for</strong> spatiallocation and object identity. Neuropsychologia 36, 129–142.Lehmann,C.,Koenig,T.,Jelic,V.,Prichep,L.,John,R.E.,Wahlund,L.-O.,Dodge,Y.,Dierks, T., 2007. Application and comparison of classification algorithms <strong>for</strong>recognition of Alzheimer's disease in electrical brain activity. J. Neurosci. Meth.161, 342–350.Mardia, K.V., Kent, J.T., Bibby, J.M., 1979. Multivariate analysis. Academic Press, London.Martens, H., Martens, M., 2001. Multivariate analysis of quality: an introduction. Wiley,London.Martens, H., Naes, T., 1989. Multivariate calibration. Wiley, London.Martinez-Montes,E.,Valdés-Sosa,P.A.,Miwakeichi,F.,Goldman,R.I.,Cohen,M.S.,2004. Concurrent EEG/fMRI analysis by partial least squares. Neuroimage 22,1023–1034.Mazerolles, G., Hanafi, M., Dufour, E., Bertrand, D., Qannari, E., 2006. Commoncomponents and specific weights analysis: a chemometric method <strong>for</strong> dealingwith complexity of food products. Chemometr. Intell. Lab. Syst. 81, 41–49.McIntosh, A., Gonzalez-Lima, F., 1991. Structural modeling of functional neuralpathways mapped with 2-deoxyglucose: effects of acoustic startle habituation onthe auditory system. Brain Res. 547, 295–302.McIntosh, A.R., Lobaugh, N.J., 2004. <strong>Partial</strong> least squares analysis of <strong>neuroimaging</strong> data:applications and advances. Neuroimage 23, S250–S263.McIntosh, A.R., Bookstein, F., Haxby, J., Grady, C., 1996. Spatial pattern analysis offunctional brain images using partial least squares. Neuroimage 3, 143–157.McIntosh, A., Nyberg, L., Bookstein, F.L., Tulving, E., 1997. Differential functionalconnectivity of prefrontal and medial temporal cortices during episodic memoryretrieval. Hum. Brain Mapp. 5, 323–327.McIntosh, A.R., Lobaugh, N.J., Cabeza, R., Bookstein, F.L., Houle, S., 1998. Convergence ofneural systems processing stimulus associations and coordinating motor responses.Cereb. Cortex 8, 648–659.McIntosh, A.R., Chau, W., Protzner, A., 2004. Spatiotemporal analysis of event-relatedfMRI data using partial least squares. Neuroimage 23, 764–775.Mentis, M.J., Dhawan, V., Feigin, A., Delalot, D., Zgaljardic, D., Edwards, C., Eidelberg, D.,2003. Early stage Parkinson's disease patients and normal volunteers: comparativemechanisms of sequence learning. Hum. Brain Mapp. 20, 246–258.Menzies, L., Achard, S., Chamberlain, S.R., Fineberg, N., Chen, C.-H., delCampo, N.,Sahakian, B.J., Robbins, T.W., Bullmore, E., 2007. Neurocognitive endophenotypes ofobsessive–compulsive disorder. Brain 130, 3223–3236.Mevik, B.-H., Wehrens, R., 2007. The <strong>PLS</strong> package: principal component and partial leastsquares regression in R. J. Stat. Softw. 18, 1–24.Nestor, P.G., O'Donnell, B.F., Mccarley, R.W., Niznikeiwicz, M., Barnard, J., Shen, Z.J.,Bookstein, F.L., Shenton, M.E., 2002. A new statistical method <strong>for</strong> testing hypothesesof neuropsychological/MRI relationships in schizophrenia: partial least squaresanalysis. Schizophr. Res. 53, 57–66.Nyberg, L., McIntosh, A.R., Cabeza, R., Habib, R., Houle, S., Tulving, E., 1996. General andspecific brain regions involved in encoding and retrieval of events: what, whereand when. Proc. Natl. Acad. Sci. USA 93, 11280–11285.Nyberg, L., Persson, J., Habib, R., Tulving, E., McIntosh, A.R., Cabeza, R., Houle, S., 2000.Large scale neurocognitive networks underlying episodic memory. J. Cogn.Neurosci. 12, 163–173.Nyberg, L., Forkstam, C., Petersson, K.M., Cabeza, R., Ingvar, M., 2002. Brain imaging ofhuman memory systems: between-systems similarities and within-systemdifferences. Brain Res. Interact. 13, 281–292.Price, J., Ziolko, S., Weissfeld, L., Klunk, W., Lu, X., Hoge, J., Meltzer, C., Davis, S., Lopresti,B., Holt, D., DeKosky, S., Mathis, C., 2004. Quantitative and statistical analyses of PETimaging studies of amyloid deposition in humans. Nuclear Science SymposiumConference Record: 2004 IEEE, 5, pp. 3161–3164.Protzner, A.B., McIntosh, A.R., 2007. The interplay of stimulus modality and responselatency in neural network organization <strong>for</strong> simple working memory tasks. J. Neurosci.27, 3187–3197.Rajah, M.N., McIntosh, A.R., 2005. Overlap in the functional neural systems involved insemantic and episodic memory retrieval. J. Cogn. Neurosci. 17, 470–482.Rao, C., 1964. The use and interpretation of principal component analysis in appliedresearch. Sankhya 26, 329–359.Rao, A., Aljabar, P., Rueckert, D., 2008. Hierarchical statistical shape analysis andprediction of sub-cortical brain structures. Med. Image Anal. 12, 55–68.Shinkareva, S., Ombao, H., Sutton, B., Mohanty, A., Miller, G., 2006. Classification of functionalbrain images with a spatio-temporal dissimilarity map. Neuroimage 33, 63–71.Shinkareva, S.V., Mason, R.A., Malave, V.L., Wang, W., Mitchell, T.M., Just, M.A., 2008.Using fMRI brain activation to identify cognitive states associated with perceptionof tools and dwellings. PLoS ONE 3.Sidtis, J.J., Strother, S.C., Rottenberg, D.A., 2003. Predicting per<strong>for</strong>mance from functionalimaging data: <strong>methods</strong> matter. Neuroimage 20, 615–624.Stone, M., Brooks, R.J., 1990. Continuum regression: cross-validated sequentiallyconstructed prediction embracing ordinary least squares, partial least squares andprincipal components regression. J. R. Stat. Soc. B Methodol. 52, 237–269.Streissguth, A., Bookstein, F., Sampson, P., Barr, H., 1993. Methods of latent variablemodeling by partial least squares. The Enduring Effects of Prenatal AlcoholExposure on Child Development. University of Michigan Press.Takane, Y., 2002. Relationships among various kinds of eigenvalue and singular valuedecompositions. In: Yanai, H., Okada, A., Shigemasu, K., Kano, Y., Meulman, J. (Eds.),New developments in psychometrics. Springer Verlag, Tokyo, pp. 45–46.Talairach, J., Tournoux, P., 1988. Co-planar Stereotaxic Atlas of the Human Brain. Thieme.Tenenhaus, M., 1998. La regression <strong>PLS</strong>. Technip, Paris.Tenenhaus, M., Tenenhaus, A., in press, Regularized generalized canonical correlationanalysis. Psychometrika.Thioulouse, J., Simier, M., Chessel, D., 2003. Simultaneous analysis of a sequence ofpaired ecological tables. Ecology 20, 2197–2208.Tippett, W., Black, S., 2008. Regional cerebral blood flow correlates of visuospatial tasksin Alzheimer's disease. J. Int. Neuropsychol. Soc. 14, 1034–1045.Tucker, L., 1958. An inter-battery method of factor analysis. Psychometrika 23, 111–136.Tyler, D.E., 1982. On the optimality of the simultaneous redundancy trans<strong>for</strong>mations.Psychometrika 47, 77–86.Vallesi, A., McIntosh, A.R., Alexander, M.P., Stuss, D.T., 2009. fMRI evidence of afunctional network setting the criteria <strong>for</strong> withholding a response. Neuroimage 45,537–548.van den Wollenberg, A., 1977. Redundancy analysis: an alternative to canonical correlation.Psychometrika 42, 207–219.Wang, J.Y., Bakhadirov, K., Devous Sr., M.D., Abdi, H., McColl, R., Moore, C., de la Plata, C.D.M.,Ding,K.,Whittemore,A.,Babcock,E.,Rickbeil,T.,Dobervich,J.,Kroll,D.,Dao,B.,Mohindra, N., Madden, C.J., Diaz-Arrastia, R., 2008. Diffusion tensor tractography oftraumatic diffuse axonal injury. Arch. Neurol. 65, 619–626.West, R., Krompinger, J., 2005. Neural correlates of prospective and retrospectivememory. Neuropsychologia 43, 418–433.West, R., Wymbs, N., 2004. Is detecting prospective cues the same as selecting targets?An ERP study. Cogn. Affect. Behav. Neurosci. 4, 354–363.Williams, L.J., Abdi, H., French, R., Orange, J.B., 2010. A <strong>tutorial</strong> on Multi-BlockDiscriminant Correspondence Analysis (MUDICA): A new method <strong>for</strong> analyzingdiscourse data from clinical populations. Journal of Speech Language and HearingResearch 53, 1372–1393.Wold, H., 1982. Soft modelling, the basic design and some extensions. In: Wold,H., Jöreskog, K.-G. (Eds.), Systems Under Indirect Observation: Causality-Structure-Prediction. Part II. North-Holland Publishing Company, Amsterdam,pp. 1–54.Wold, S., Sjöström, M., Eriksson, L., 2001. <strong>PLS</strong>-regression: a basic tool of chemometrics.Chemom. Intell. Lab. Syst. 58, 109–130.Please cite this article as: Krishnan, A., et al., <strong>Partial</strong> <strong>Least</strong> <strong>Squares</strong> (<strong>PLS</strong>) <strong>methods</strong> <strong>for</strong> <strong>neuroimaging</strong>: A <strong>tutorial</strong> and review, NeuroImage(2010), doi:10.1016/j.neuroimage.2010.07.034

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!