13.07.2015 Views

Package 'robCompositions'

Package 'robCompositions'

Package 'robCompositions'

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Package</strong> ‘robCompositions’November 26, 2013Type <strong>Package</strong>Title Robust Estimation for Compositional Data.Version 1.6.4Date 2013-11-26Depends R (>= 2.10), utils, robustbase, rrcov, car (>= 2.0-0), MASS,plsAuthor Matthias Templ, Karel Hron, Peter FilzmoserMaintainer Matthias Templ Description The package includes methods for imputationof compositional data including robust methods, methods to impute rounded zeros,(robust) outlierdetection for compositional data,(robust) principal componentanalysis for compositional data, (robust) factor analysis for compositionaldata, (robust) discriminant analysis for compositional data (Fisher rule), robust regressionwith compositional predictors and (robust) Anderson-Darling normality testsfor compositional data as well as popular log-ratio transformations (alr, clr, ilr, and their inversetransformations).In addition, visualisation and diagnostic tools are implemented as well as high and lowlevelplot functions for theternary diagram.License GPL-2LazyLoad yesNeedsCompilation yesRepository CRANDate/Publication 2013-11-26 12:55:441


2 R topics documented:R topics documented:robCompositions-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3addLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5addLRinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6aDist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8adjust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9adtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10adtestWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11arcticLake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13cenLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13cenLRinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15coffee . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16constSum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16daFisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17expenditures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19expendituresEU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20gm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21haplogroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21impAll . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22impCoda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23impKNNa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25impRZalr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27impRZilr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28isomLR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30isomLRinv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31lmCoDaX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32machineOperators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33missPatterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34orthbasis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35outCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36pcaCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37pfa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38phd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40plot.imp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41plot.outCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43plot.pcaCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44print.adtestWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45print.daFisher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46print.imp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47print.outCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48print.pcaCoDa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49robVariation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50skyeLavas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51summary.adtestWrapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51summary.imp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52ternaryDiag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53ternaryDiagAbline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


obCompositions-package 3ternaryDiagEllipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56ternaryDiagPoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Index 58robCompositions-packageRobust Estimation for Compositional Data.DescriptionDetailsThe package contains methods for imputation of compositional data including robust methods, (robust)outlier detection for compositional data, (robust) principal component analysis for compositionaldata, (robust) factor analysis for compositional data, (robust) discriminant analysis (Fisherrule) and (robust) Anderson-Darling normality tests for compositional data as well as popular logratiotransformations (alr, clr, ilr, and their inverse transformations).<strong>Package</strong>:Type:Version: 1.3.3Date: 2009-11-28License: GPL 2LazyLoad: yesrobCompositions<strong>Package</strong>Author(s)Matthias Templ, Peter Filzmoser, Karel Hron,Maintainer: Matthias Templ ReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p. \Filzmoser, P., and Hron, K. (2008) Outlier detection for compositional data using robust methods.Math. Geosciences, 40 233-248.Filzmoser, P., Hron, K., Reimann, C. (2009) Principal Component Analysis for Compositional Datawith Outliers. Environmetrics, 20 (6), 621–632.P. Filzmoser, K. Hron, C. Reimann, R. Garrett (2009): Robust Factor Analysis for CompositionalData. Computers and Geosciences, 35 (9), 1854–1861.


4 robCompositions-packageHron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods Computational Statistics and Data Analysis, 54 (12), 3095–3107.C. Reimann, P. Filzmoser, R.G. Garrett, and R. Dutter (2008): Statistical Data Analysis Explained.Applied Environmental Statistics with R. John Wiley and Sons, Chichester, 2008.Examples## k nearest neighbor imputationdata(expenditures)expenditures[1,3]expenditures[1,3]


addLR 5y


6 addLRinvAuthor(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See AlsoaddLRinv, isomLR, alrExamplesdata(arcticLake)x


addLRinv 7ivaruseClassInfoindex of the rationing part. If the object is of class “alr” the column names arechosen from therein. If not and ivar is not provided by the user, it is assumedthat the rationing part was the last column of the data in the simplex.if FALSE, the class information of object x is not used.DetailsThe function allows also to preserve absolute values when class info is provided. Otherwise onlythe relative information is preserved.Valuethe transformed data matrixAuthor(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See AlsoisomLRinv, cenLRinv, cenLR, addLR, ilrInvExamplesdata(arcticLake)x


8 aDistaDistAitchison distanceDescriptionComputes the Aitchison distance between two observations or between two data sets.UsageaDist(x, y)Argumentsxya vector, matrix or data.framea vector, matrix or data.frame with equal dimension as xDetailsValueThis distance measure accounts for the relative scale property of the Aitchison distance. It measuresthe distance between two compositions if x and y are vectors and evaluate sum of the distancesbetween x and y for each row of x and y if x and y are matrices or data frames.It is not designed to apply it on one matrix, such as function ‘acomp()’ in package ‘compositions’,but it is designed to compare different matrices.The underlying code is written in C and allows a fast computation also for large data sets.The Aitchison distance between two compositions or between two data sets.Author(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman and Hall Ltd., London (UK). 416p.Aitchison, J. and Barcelo-Vidal, C. and Martin-Fernandez, J.A. and Pawlowsky-Glahn, V. (2000)Logratio analysis and compositional distance. Mathematical Geology, 32, 271-275.Hron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods Computational Statistics and Data Analysis, vol 54 (12),pages 3095-3107.See AlsoisomLR


adjust 9Examplesdata(expenditures)x


10 adtestSee AlsoimpCodaExamplesdata(expenditures)x


adtestWrapper 11Valuestatisticmethodp.valueThe result of the corresponding test statisticThe chosen method (univariate, angle or radius)p-valueNoteThese functions are use by adtestWrapper.Author(s)Karel Hron, Matthias TemplReferencesAnderson, T.W. and Darling, D.A. (1952) Asymptotic theory of certain goodness-of-fit criteriabased on stochastic processes. Annals of Mathematical Statistics, 23 193-212.See AlsoadtestWrapperExamplesadtest(rnorm(100))data(machineOperators)x


12 adtestWrapperArgumentsxalphaRrobustEstcompositional data of class data.frame or matrixsignificance levelNumber of Monte Carlo simulations in order to provide p-values.logicalDetailsValueFirst, the data is transformed using the ‘ilr’-transformation. After applying this transformation- all (D-1)-dimensional marginal, univariate distributions are tested using the univariate Anderson-Darling test for normality.- all 0.5 (D-1)(D-2)-dimensional bivariate angle distributions are tested using the Anderson-Darlingangle test for normality.- the (D-1)-dimensional radius distribution is tested using the Anderson-Darling radius test for normality.rescheckalphainfoesta list including each test resultinformation about the rejection of the null hypothesisthe underlying significance levelfurther information which is used by the print and summary method.“standard” for standard estimation and “robust” for robust estimationAuthor(s)Matthias Templ and Karel HronReferencesAnderson, T.W. and Darling, D.A. (1952) Asymptotic theory of certain goodness-of-fit criteriabased on stochastic processes Annals of Mathematical Statistics, 23 193-212.Aitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See Alsoadtest, isomLRExamplesdata(machineOperators)a


arcticLake 13arcticLakeArtic lake sediment dataDescriptionUsageFormatDetailsSourceSand, silt, clay compositions of 39 sediment samples at different water depths in an Arctic lake.This data set can be found on page 359 of the Aitchison book (see reference).data(arcticLake)A data frame with 39 observations on the following 3 variables.sand numeric vector of percentages of sandsilt numeric vector of percentages of siltclay numeric vector of percentages of clayThe rows sum up to 100, except for rounding errors. The full data set including the water depth canbe found in package compositions, for example.Aitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.Examplesdata(arcticLake)cenLRCentred log-ratio transformationDescriptionUsageThe cenLR transformation moves D-part compositional data from the simplex into a D-dimensionalreal space.cenLR(x)


14 cenLRArgumentsxmultivariate data ideally of class data.frame or matrixDetailsEach composition is divided by the geometric mean of its parts before the logarithm is taken.ValueThe transformed data, includingx.clrgmclr transformed datathe geometric means of the original composition.NoteThe resulting transformed data set is singular by definition.Author(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See AlsocenLRinv, addLR, isomLR, addLRinv, isomLRinvExamplesdata(expenditures)eclr


cenLRinv 15cenLRinvInverse centred log-ratio transformationDescriptionApplies the inverse centred log-ratio transformation.UsagecenLRinv(x, useClassInfo = TRUE)ArgumentsxuseClassInfoan object of class “clr”, “data.frame” or “matrix”if the object is of class “clr”, the useClassInfo is used to determine if the classinformation should be used. If yes, also absolute values may be preserved.Valuethe transformed data set.Author(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See AlsocenLR, addLR, isomLR, addLRinv, isomLRinvExamplesdata(expenditures)eclr


16 constSumcoffeeCoffee dataDescriptionUsageFormatDetails27 commercially available coffee samples of different origins.data(coffee)A data frame with 27 observations on the following 4 variables.Metpyr Hydroxy-2-propanone5-Met methylpyrazinefurfu methylfurfuralsort a character vectorIn the original data set, 15 volatile compounds (descriptors of coffee aroma) were selected for astatistical analysis. We selected only three compounds (compositional parts) Hydroxy-2-propanone,methylpyrazine and methylfurfural to allow for a visualization in a ternary diagram.ReferencesM.~Korhonov\’a, K.~Hron, D.~Klimc\’ikov\’a, L.~Muller, P.~Bedn\’ar, and P.~Bart\’ak (2009)Coffee aroma - statistical analysis of compositional data. Talanta, 80(2): 710–715.Examplesdata(coffee)constSumConstant sumDescriptionUsageCloses compositions to sum up to a given constant (default 1), by dividing each part of a compositionby its row sum.constSum(x, const=1, na.rm=TRUE)


daFisher 17Argumentsxmultivariate data ideally of class data.frame or matrixconst constant, the default equals 1.na.rmremoving missing values.ValueThe data for which the row sums are equal to const.Author(s)Matthias TemplSee AlsocloExamplesdata(expenditures)constSum(expenditures)constSum(expenditures, 100)daFisherDiscriminant analysis by Fisher Rule.DescriptionDiscriminant analysis by Fishers rule.UsagedaFisher(x, grp, coda = TRUE, method = "classical", plotScore=FALSE)ArgumentsxgrpcodamethodplotScorea matrix or data frame containing the explanatory variables (training set)grouping variable: a factor specifying the class for each observation.TRUE, when the underlying data are compositions.“classical” or “robust”TRUE, if the scores should be plotted automatically.


18 daFisherDetailsValueThe Fisher rule leads only to linear boundaries. However, this method allows for dimension reductionand thus for a better visualization of the separation boundaries. For the Fisher discriminantrule (Fisher, 1938; Rao, 1948) the assumption of normal distribution of the groups is not explicitlyrequired, although the method looses its optimality in case of deviations from normality.The classical Fisher discriminant rule is invariant to ilr and clr transformations. The robust rule isinvariant to ilr transformations if affine equivariant robust estimators of location and covariance aretaken.Robustification is done (method “robust”) by estimating the columnwise means and the covarianceby the Minimum Covariance Estimator.an object of class “daFisher” including the following elementsBWloadingscodaBetween variance of the groupsWithin variance of the groupsloadingscodaAuthor(s)The code is was written by Peter Filzmoser. Minor modifications by Matthias Templ.ReferencesFilzmoser, P. and Hron, K. and Templ, M. (2009) Discriminant analysis for compositional data androbust parameter estimation. Research Report SM-2009-3, Vienna University of Technology, 27pages.Fisher, R. A. (1938) The statistical utiliziation of multiple measurements. Annals of Eugenics,8:376-386.Rao, C.R. (1948) The utilization of multiple measurements in problems of biological classification.Journal of the Royal Statistical Society, Series B, 10:159-203.See AlsoLindaExamplesrequire(MASS)x1


expenditures 19d1


20 expendituresEUexpendituresEUMean consumption expenditures data.DescriptionUsageFormatSourceMean consumption expenditure of households at EU-level. The final consumption expenditure ofhouseholds encompasses all domestic costs (by residents and non-residents) for individual needs.data(expendituresEU)A data frame with 27 observations on the following 12 variables.Food a numeric vectorAlcohol a numeric vectorClothing a numeric vectorHousing a numeric vectorFurnishings a numeric vectorHealth a numeric vectorTransport a numeric vectorCommunications a numeric vectorRecreation a numeric vectorEducation a numeric vectorRestaurants a numeric vectorOther a numeric vectorEurostat: http://epp.eurostat.ec.europa.eu/statistics_explained/images/c/c2/Mean_consumption_expenditure_of_households,_2005(PPS).PNGReferencesEurostat provides a website with the data:http://epp.eurostat.ec.europa.eu/statistics_explained/index.php/Household_consumption_expenditureExamplesdata(expendituresEU)


gm 21gmgeometric meanDescriptionThis function calculates the geometric mean.Usagegm(x)ArgumentsxA numeric vector.DetailsCalculates the geometric mean of all positive entries of a vector.ValueThe geometric mean.Author(s)Matthias TemplSee AlsogeometricmeanExamplesgm(runif(100))haplogroupsHaplogroups data.DescriptionDistribution of European Y-chromosome DNA (Y-DNA) haplogroups by region in percentage.Usagedata(haplogroups)


22 impAllFormatDetailsA data frame with 38 observations on the following 12 variables.I1 pre-Germanic (Nordic)I2b pre-Celto-GermanicI2a1 Sardinian, BasqueI2a2 Dinaric, DanubianN1c1 Uralo-Finnic, Baltic, SiberianR1a Balto-Slavic, Mycenaean Greek, MacedoniaR1b Italic, Celtic, Germanic; Hitite, ArmenianG2a Caucasian, Greco-AnatolienE1b1b North and Eastern Afrika, Near Eastern, BalkanicJ2 Mesopotamian, Minoan Greek, PhoenicianJ1 Semitic (Arabic, Jewish)T Near-Eastern, Egyptian, Ethiopian, ArabicHuman Y-chromosome DNA can be divided in genealogical groups sharing a common ancestor,called haplogroups.SourceEupedia: http://www.eupedia.com/europe/european_y-dna_haplogroups.shtmlExamplesdata(haplogroups)impAllReplacement of rounded zeros and missing values.DescriptionParametric replacement of rounded zeros and missing values for compositional data using classicaland robust methods based on ilr-transformations with special choice of balances. Values underdetection limit should be saved with the negative value of the detection limit (per variable). Missingvalues should be coded as NA.UsageimpAll(x)


impCoda 23Argumentsxdata frameDetailsThis is a wrapper function that calls impRZilr() for the replacement of zeros and impCoda forthe imputation of missing values sequentially. The detection limit is automatically derived formnegative numbers in the data set.ValueThe imputed data set.NoteThis function is mainly used by the compositionsGUI.Author(s)Jiri EichlerReferencesHron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods, Computational Statistics and Data Analysis, vol 54 (12),pages 3095-3107.Martin-Fernandez, J.A. and Hron, K. and Templ, M. and Filzmoser, P. and Palarea-Albaladejo,J. (2012) Model-based replacement of rounded zeros in compositional data: Classical and robustapproaches, Computational Statistics, 56 (2012), S. 2688 - 2704.See AlsoimpCoda, impRZilrExamples## see the compositionsGUIimpCodaImputation of missing values in compositional dataDescriptionThis function offers different methods for the imputation of missing values in compositional data.Missing values are initialized with proper values. Then iterative algorithms try to find better estimationsfor the former missing values.


24 impCodaUsageimpCoda(x, maxit = 10, eps = 0.5, method = "ltsReg", closed = FALSE,init = "KNN", k = 5, dl = rep(0.05, ncol(x)),noise=0.1, bruteforce=FALSE)Argumentsxmaxitepsmethodclosedinitkdlnoisebruteforcedata frame or matrixmaximum number of iterationsconvergence criteriaimputation methodimputation of transformed data (using ilr transformation) or in the original space(closed equals TRUE)method for initializing missing valuesnumber of nearest neighbors (if init $==$ “KNN”)detection limit(s), only important for the imputation of rounded zerosamount of adding random noise to predictors after convergencyif TRUE, imputations over dl are set to dl. If FALSE, truncated (Tobit) regressionis applied.Detailseps: The algorithm is finished as soon as the imputed values stabilize, i.e. until the sum of Aitchisondistances from the present and previous iteration changes only marginally (eps).\method: Several different methods can be chosen, such as ‘ltsReg’: least trimmed squares regressionis used within the iterative procedure. ‘lm’: least squares regression is used within the iterativeprocedure. ‘classical’: principal component analysis is used within the iterative procedure. ‘ltsReg2’:least trimmed squares regression is used within the iterative procedure. The imputatedvalues are perturbed in the direction of the predictor by values drawn form a normal distributionwith mean and standard deviation related to the corresponding residuals and multiplied by noise.method ‘roundedZero’ is experimental. It imputes rounded zeros within our iterative framework.ValuexOrigxImpcriteriaitermaxitwwindOriginal data frame or matrixImputed dataSum of the Aitchison distances from the present and previous iterationNumber of iterationsMaximum number of iterationsAmount of imputed valuesIndex of the missing values in the data


impKNNa 25Author(s)Matthias Templ, Karel HronReferencesHron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods Computational Statistics and Data Analysis, vol 54 (12),pages 3095-3107.See AlsoimpKNNa, isomLRExamplesdata(expenditures)x


26 impKNNanormknndasadjAn adjustment of the imputed values is performed if TRUEdepricated. if TRUE, the definition of the Aitchison distance, based on simplelogratios of the compositional part, is used (Aitchison, 2000) to calculate distancesbetween observations. if FALSE, a version using the clr transformationis used.either ‘median’ (default) or ‘sum’ can be chosen for the adjustment of the nearestneighbors, see Hron et al., 2010.DetailsThe Aitchison metric should be chosen when dealing with compositional data, the Euclideanmetric otherwise.If primitive == FALSE, a sequential search for the k-nearest neighbors is applied for everymissing value where all information corresponding to the non-missing cells plus the information inthe variable to be imputed plus some additional information is available. If primitive == TRUE,a search of the k-nearest neighbors among observations is applied where in addition to the variableto be imputed any further cells are non-missing.If normknn is TRUE (prefered option) the imputed cells from a nearest neighbor method are adjustedwith special adjustment factors (more details can be found online (see the references)).ValuexOrigxImpwwindmetricOriginal data frame or matrixImputed dataAmount of imputed valuesIndex of the missing values in the dataMetric usedAuthor(s)Matthias TemplReferencesAitchison, J. and Barcelo-Vidal, C. and Martin-Fernandez, J.A. and Pawlowsky-Glahn, V. (2000)Logratio analysis and compositional distance, Mathematical Geology 32(3):271-275.Hron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods Computational Statistics and Data Analysis, vol 54 (12),pages 3095-3107.See AlsoimpCoda


impRZalr 27Examplesdata(expenditures)x


28 impRZilrAuthor(s)See AlsoMatthias Templ and Karel HronimpRZilrExamplesdata(arcticLake)x


impRZilr 29Rverbosenumber of bootstrap samples for the determination of pls components. Onlyimportant for method “pls”.additional print output during calculations.DetailsStatistical analysis of compositional data including zeros runs into problems, because log-ratioscannot be applied. Usually, rounded zeros are considerer as missing not at random missing values.The algorithm iteratively imputes parts with rounded zeros whereas in each step (1) an specificilr transformation is applied (2) tobit regression is applied (3) the rounded zeros are replaced bythe expected values (4) the corresponding inverse ilr transformation is applied. After all parts areimputed, the algorithm starts again until the imputations do not change.ValuexOrigxImpwinditerepsOriginal data frame or matrixImputed dataIndex of the missing values / values below detection limit in the dataNumber of iterationsepsAuthor(s)Matthias Templ and Peter FilzmoserSee AlsoimpRZalrExamplesdata(arcticLake)x


30 isomLRisomLRIsometric log-ratio transformationDescriptionAn isometric log-ratio transformation with a special choice of the balances according to Hron et al.(2010).UsageisomLR(x)Argumentsxobject of class data.frame or matrix with positive entriesDetailsThe isomLR transformation moves D-part compositional data from the simplex into a (D-1)-dimensionalreal space isometrically. From this choice of the balances, all the relative information of the part x 1from the remaining parts is separated. It is useful for estimating missing values in x 1 by regressionof the remaining variables.ValueThe isomLR transformed data.Author(s)Karel Hron, Matthias TemplReferencesEgozcue J.J., V. Pawlowsky-Glahn, G. Mateu-Figueras and C. Barcel’o-Vidal (2003) Isometric logratiotransformations for compositional data analysis. Mathematical Geology, 35(3) 279-300. \Hron, K. and Templ, M. and Filzmoser, P. (2010) Imputation of missing values for compositionaldata using classical and robust methods Computational Statistics and Data Analysis, vol 54 (12),pages 3095-3107.See AlsoisomLRinv, ilr


isomLRinv 31Examplesrequire(MASS)Sigma


32 lmCoDaXExamplesrequire(MASS)Sigma


machineOperators 33See AlsolmExamples## How the total household expenditures in EU Member## States depend on relative contributions of## single household expenditures:data(expendituresEU)y


34 missPatternsmissPatternsmissing or zero pattern structure.DescriptionUsageAnalysis of the missing or zero patterns structure of a data set.missPatterns(x)zeroPatterns(x)Argumentsxa data frame or matrix.DetailsValueHere, one pattern defines those observations that have the same structure regarding their missingnessor zeros. For all patterns a summary is calculated.groupscntabcombtabcombPlusrsumList of the different patterns and the observation numbers for each patternthe names of the patterns coded as vectors of 0-1’sthe pattern structure - all combinations of zeros or missings in the variablesthe pattern structure - all combinations of zeros or missings in the variables includingthe size of those combinations/patterns, i.e. the number of observationsthat belongs to each pattern.the number of zeros or missing values in each row of the data setAuthor(s)Matthias Templ. The code is based on a previous version from Andreas Alfons and Matthias Templfrom package VIMSee AlsoaggrExamplesdata(expenditures)## set NAs artificial:expenditures[expenditures < 300]


orthbasis 35orthbasisOrthonormal basisDescriptionOrthonormal basis from cenLR transformed data to isomLR transformated data.Usageorthbasis(D)ArgumentsDnumber of parts (variables)DetailsFor the chosen balances for “isomLR”, this is the orthonormal basis that transfers the data fromcentered logratio to isometric logratio.Valuethe orthonormal basis.Author(s)Karel Hron, Matthias TemplSee AlsoisomLR, cenLRExamplesdata(expenditures)V


36 outCoDaoutCoDaOutlier detection for compositional dataDescriptionOutlier detection for compositional data using standard and robust statistical methods.UsageoutCoDa(x, quantile = 0.975, method = "robust", h = 1/2)Argumentsxquantilemethodhcompositional dataquantile, corresponding to a significance level, is used as a cut-off value for outlieridentification: observations with larger (squared) robust Mahalanobis distanceare considered as potential outliers.either “robust” (default) or “standard”the size of the subsets for the robust covariance estimation according the MCDestimatorfor which the determinant is minimized (the default is (n+p+1)/2).DetailsValueThe outlier detection procedure is based on (robust) Mahalanobis distances after a isometric logratiotransformation of the data. Observations with squared Mahalanobis distance greater equal a certainquantile of the Chi-squared distribution are marked as outliers.If method “robust” is chosen, the outlier detection is based on the homogeneous majority of thecompositional data set. If method “standard” is used, standard measures of location and scatter areapplied during the outlier detection procedure.mahalDistlimitoutlierIndexmethodresulting Mahalanobis distancequantile of the Chi-squared distributionlogical vector indicating outliers and non-outliersmethod usedNoteIt is highly recommended to use the robust version of the procedure.Author(s)Matthias Templ, Karel Hron


pcaCoDa 37ReferencesEgozcue J.J., V. Pawlowsky-Glahn, G. Mateu-Figueras and C. Barcel’o-Vidal (2003) Isometric logratiotransformations for compositional data analysis. Mathematical Geology, 35(3) 279-300. \Filzmoser, P., and Hron, K. (2008) Outlier detection for compositional data using robust methods.Math. Geosciences, 40 233-248.\Rousseeuw, P.J., Van Driessen, K. (1999) A fast algorithm for the minimum covariance determinantestimator. Technometrics, 41 212-223.See AlsoisomLRExamplesdata(expenditures)oD


38 pfaAuthor(s)K. Hron, P. Filzmoser, M. TemplReferencesFilzmoser, P., Hron, K., Reimann, C. (2009) Principal Component Analysis for Compositional Datawith Outliers. Environmetrics, 20, 621-632.See Alsoprint.pcaCoDa, plot.pcaCoDaExamplesdata(expenditures)p1


pfa 39DetailsValuemaxitercontrolmaximum number of iterationsdefault value is NULL... arguments for creating a listThe main difference to usual implementations is that uniquenesses are nor longer of diagonal form.This kind of factor analysis is designed for centered log-ratio transformed compositional data. However,if the covariance is not specified, the covariance is estimated from isometric log-ratio transformeddata internally, but the data used for factor analysis are backtransformed to the clr space(see Filzmoser et al., 2009).loadingsuniqunesscorrelationcriteriafactorsdofmethodn.obsA matrix of loadings, one column for each factor. The factors are ordered indecreasing order of sums of squares of loadings.uniqunesscorrelation matrixThe results of the optimization: the value of the negativ log-likelihood and informationof the iterations used.the factorsdegrees of freedom“principal”number of observations if available, or NAcallThe matched call.STATISTIC, PVALThe significance-test statistic and p-value, if they can be computedAuthor(s)Peter Filzmoser, Karel Hron, Matthias TemplReferencesC. Reimann, P. Filzmoser, R.G. Garrett, and R. Dutter (2008): Statistical Data Analysis Explained.Applied Environmental Statistics with R. John Wiley and Sons, Chichester, 2008.P. Filzmoser, K. Hron, C. Reimann, R. Garrett (2009): Robust Factor Analysis for CompositionalData. Computers and Geosciences, 35 (9), 1854–1861.Examplesdata(expenditures)x


40 phdres2


plot.imp 41plot.impPlot method for objects of class impDescriptionUsageThis function provides several diagnostic plots for the imputed data set in order to see how theimputated values are distributed in comparison with the original data values.## S3 method for class impplot(x, ..., which = 1, ord = 1:ncol(x),colcomb = "missnonmiss", plotvars = NULL,col = c("skyblue", "red"), alpha = NULL,lty = par("lty"), xaxt = "s", xaxlabels = NULL,las = 3, interactive = TRUE, pch = c(1, 3),smooth = FALSE, reg.line = FALSE,legend.plot = FALSE,ask = prod(par("mfcol")) < length(which) && dev.interactive(),center = FALSE, scale = FALSE, id = FALSE,seg.l = 0.02, seg1 = TRUE)Argumentsxobject of class ‘imp’... other parameters to be passed through to plotting functions.which if a subset of the plots is required, specify a subset of the numbers 1:3.ordcolcombplotvarscolalphaltyxaxtxaxlabelslasdetermines the ordering of the variablesif colcomb=“missnonmiss”, observations with missings in any variable are highlighted.Otherwise, observations with missings in any of the variables specifiedby colcomb are highlighted in the parallel coordinate plot.Parameter for the parallel coordinate plot. A vector giving the variables to beplotted. If NULL (the default), all variables are plotted.a vector of length two giving the colors to be used in the plot. The second colorwill be used for highlighting.a numeric value between 0 and 1 giving the level of transparency of the colors,or NULL. This can be used to prevent overplotting.a vector of length two giving the line types. The second line type will be usedfor the highlighted observations. If a single value is supplied, it will be used forboth non-highlighted and highlighted observations.the x-axis type (see par).a character vector containing the labels for the x-axis. If NULL, the columnnames of x will be used.the style of axis labels (see par).


42 plot.impinteractivepchsmoothreg.linelegend.plotaskcenterscaleidseg.lseg1a logical indicating whether the variables to be used for highlighting can beselected interactively (see ‘Details’).a vector of length two giving the symbol of the plotting points. The symbol willbe used for the highlighted observations. If a single value is supplied, it will beused for both non-highlighted and highlighted observations.if TRUE a lowess smooth is plotted in each off-diagonal panel of the multiplescatterplot. Further detail can be found in package car.if not FALSE a line is plotted using the function given by this argument; e.g.,using rlm in package MASS plots a robust-regression line within the multiplescatterplot.if TRUE then a legend for the groups is plotted in the bottom-right cell of themultiple scatterplot.logical; if TRUE, the user is asked before each plot, see par(ask=.).logical, indicates if the data should be centered prior plotting the ternary plot.logical, indicates if the data should be centered prior plotting the ternary plot.reads the position of the graphics pointer when the (first) mouse button is pressedand returns the corresponding index of the observation. (only used by the ternaryplot)length of the plotting symbol (spikes) for the ternary plot.if TRUE, the spikes of the plotting symbol are justified.DetailsThe first plot (which == 1) is a multiple scatterplot where for the imputed values another plotsymbol and color is used in order to highlight them.Plot 2 is a parallel coordinate plot in which imputed values in certain variables are highlighted.In parallel coordinate plots, the variables are represented by parallel axes. Each observation of thescaled data is shown as a line. If interactive is TRUE, the variables to be used for highlighting can beselected interactively. Observations which includes imputed values in any of the selected variableswill be highlighted. A variable can be added to the selection by clicking on a coordinate axis.If a variable is already selected, clicking on its coordinate axis will remove it from the selection.Clicking anywhere outside the plot region quits the interactive session.Plot 3 shows a ternary diagram in which imputed values are highlighted, i.e. those spikes of thechosen plotting symbol are colored in red for which of the values are missing in the unimputed dataset.ValueNone (invisible NULL).Author(s)Matthias Templ


plot.outCoDa 43ReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.Wegman, E. J. (1990) Hyperdimensional data analysis using parallel coordinates Journal of theAmerican Statistical Association 85, 664–675.See AlsoimpCoda, impKNNa, \ scatterplot.matrixExamplesdata(expenditures)expenditures[1,3]expenditures[1,3]


44 plot.pcaCoDaAuthor(s)Matthias TemplReferencesFilzmoser, P., and Hron, K. (2008) Outlier detection for compositional data using robust methods.Math. Geosciences, 40 233-248.See AlsooutCoDaExamplesdata(expenditures)oD


print.adtestWrapper 45Author(s)M. Templ, K. HronReferencesAitchison, J. and Greenacre, M. (2002). Biplots of compositional data. Applied Statistics, 51,375-392. \Filzmoser, P., Hron, K., Reimann, C. (2009) Principal Component Analysis for Compositional Datawith Outliers. Environmetrics, 20 (6), 621–632.See AlsopcaCoDaExamplesdata(expenditures)p1


46 print.daFisherAuthor(s)Matthias Templ and Karel HronSee AlsoadtestWrapper, summary.adtestWrapperExamplesdata(machineOperators)a


print.imp 47Examplesrequire(MASS)x1


48 print.outCoDaExamplesdata(expenditures)expenditures[1,3]expenditures[1,3]


print.pcaCoDa 49print.pcaCoDaPrint method for pcaCoDa objectsDescriptionPrint method for objects of class ‘pcaCoDa’.Usage## S3 method for class pcaCoDaprint(x, ...)Argumentsxobject of class ‘pcaCoDa’... ...ValuePrints the (cummulative) percentages of explained variability for clr transformed data by principalcomponent analysis.Author(s)M. Templ, K. HronSee AlsopcaCoDa, plot.pcaCoDaExamplesdata(expenditures)p1


50 robVariationrobVariationRobust variation matrixDescriptionEstimates the variation matrix with robust methods.UsagerobVariation(x, robust=TRUE)Argumentsxrobustdata frame or matrix with positive entriesif FALSE, standard measures are used.DetailsThe variation matrix is estimated for a given compositional data set. Instead of using the classicalstandard deviations the mad is used when parameter robust is set to TRUE.ValueThe (robust) variation matrix.Author(s)Matthias TemplReferencesAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.See AlsovariationExamplesdata(expenditures)robVariation(expenditures)robVariation(expenditures, robust=FALSE)


skyeLavas 51skyeLavasAphyric skye lavas dataDescriptionUsageFormatSourceAFM compositions of 23 aphyric Skye lavas. This data set can be found on page 360 of the Aitchisonbook (see reference).data(skyeLavas)A data frame with 23 observations on the following 3 variables.sodium-potassium a numeric vector of percentages of Na2O+K2Oiron a numeric vector of percentages of Fe2O3magnesium a numeric vector of percentages of MgOAitchison, J. (1986) The Statistical Analysis of Compositional Data Monographs on Statistics andApplied Probability. Chapman \& Hall Ltd., London (UK). 416p.Examplesdata(skyeLavas)summary.adtestWrappersummary method for objects of class adtestWrapperDescriptionUsageProvides a summary as shown in the examples.## S3 method for class adtestWrappersummary(object, ...)Argumentsobjectobject of class ‘adtestWrapper’... additional arguments passed through


52 summary.impDetailsValueA similar output is proposed by (Pawlowsky-Glahn, et al. (2008). In addition to that, p-values areprovided.a data frame including an information about the ilr-variables used (first column), the underlyingtest (second column), the test statistics (third column), the corresponding estimated p-values (fourthcolumn) and an information about the rejection of the null hypothesis (last column).Author(s)Matthias Templ and Karel HronReferencesPawlowsky-Glahn, V., Egozcue, J.J. and Tolosana-Delgado, R. (2008), Lecture Notes on CompositionalData Analysis Universitat de Girona, http://dugi-doc.udg.edu/bitstream/10256/297/1/CoDa-book.pdfSee AlsoadtestWrapperExamplesdata(machineOperators)a


ternaryDiag 53DetailsValueNote that this function will be enhanced with more sophisticated methods in future versions of thepackage. It is very rudimental in its present form.None (invisible NULL).Author(s)See AlsoMatthias TemplimpCoda, impKNNaExamplesdata(expenditures)expenditures[1,3]expenditures[1,3]


54 ternaryDiaglinerobustgrouptolmay be set to “none”, “pca”, “regression”, “regressionconf”, “regressionpred”,“ellipse”, “lda”if line equals TRUE, it dedicates if a robust estimation is applied or not.if line equals “da”, it determines the grouping variableif line equals “ellipse”, it determines the parameter for the tolerance ellipse... further parameters, see, e.g., par()DetailsThe relative proportions of each variable are plotted.Author(s)Peter Filzmoser http://www.statistik.tuwien.ac.at/public/filz/, Matthias TemplReferencesC. Reimann, P. Filzmoser, R.G. Garrett, and R. Dutter: Statistical Data Analysis Explained. AppliedEnvironmental Statistics with R. John Wiley and Sons, Chichester, 2008.See AlsoternaryExamplesdata(arcticLake)ternaryDiag(arcticLake)data(coffee)x


ternaryDiagAbline 55ternaryDiagAblineAdds a line to a ternary diagram.DescriptionA low-level plot function which adds a line to a high-level ternary diagram.UsageternaryDiagAbline(x, ...)ArgumentsxTwo-dimensional data set in isometric log-ratio transformed space.... Additional graphical parameters passed through.DetailsThis is a small utility function which helps to add a line in a ternary plot from two given points inan isometric transformed space.Valueno values are returned.Author(s)Matthias TemplSee AlsoternaryDiagExamplesdata(coffee)x


56 ternaryDiagEllipseternaryDiagEllipseAdds tolerance ellipses to a ternary diagram.DescriptionLow-level plot function which add tolerance ellipses to a high-level plot of a ternary diagram.UsageternaryDiagEllipse(x, tolerance = c(0.9, 0.95, 0.975), locscatt = "MCD", ...)ArgumentsxtolerancelocscattThree-part composition. Object of class “matrix” or “data.frame”.Determines the amount of observations with Mahalanobis distance larger thanthe drawn ellipse, scaled to one.Method for estimating the mean and covariance.... Additional arguments passed trough.Valueno values are returned.Author(s)Peter Filzmoser, Matthias TemplSee AlsoternaryDiagExamplesdata(coffee)x


ternaryDiagPoints 57ternaryDiagPointsAdd points or lines to a given ternary diagram.DescriptionUsageLow-level plot function to add points or lines to a ternary high-level plot.ternaryDiagPoints(x, ...)ternaryDiagLines(x, ...)ArgumentsxValueThree-dimensional composition given as an object of class “matrix” or “data.frame”.... Additional graphical parameters passed through.no values are returned.Author(s)Matthias TemplReferencesC. Reimann, P. Filzmoser, R.G. Garrett, and R. Dutter: Statistical Data Analysis Explained. AppliedEnvironmental Statistics with R. John Wiley and Sons, Chichester, 2008.See AlsoternaryDiagExamplesdata(coffee)x


Index∗Topic aplotplot.imp, 41plot.pcaCoDa, 44ternaryDiag, 53ternaryDiagAbline, 55ternaryDiagEllipse, 56ternaryDiagPoints, 57∗Topic arithaDist, 8∗Topic datasetsarcticLake, 13coffee, 16expenditures, 19expendituresEU, 20haplogroups, 21machineOperators, 33phd, 40skyeLavas, 51∗Topic hplotplot.imp, 41plot.outCoDa, 43∗Topic htestadtest, 10adtestWrapper, 11∗Topic iterationimpCoda, 23∗Topic manipaddLR, 5addLRinv, 6adjust, 9cenLR, 13cenLRinv, 15constSum, 16impAll, 22impKNNa, 25impRZalr, 27impRZilr, 28orthbasis, 35∗Topic mathaDist, 8gm, 21isomLR, 30isomLRinv, 31∗Topic modelslmCoDaX, 32∗Topic multivariatedaFisher, 17impCoda, 23impKNNa, 25impRZalr, 27impRZilr, 28missPatterns, 34outCoDa, 36pcaCoDa, 37pfa, 38robVariation, 50ternaryDiag, 53∗Topic packagerobCompositions-package, 3∗Topic printprint.adtestWrapper, 45print.daFisher, 46print.imp, 47print.outCoDa, 48print.pcaCoDa, 49summary.adtestWrapper, 51summary.imp, 52∗Topic robustimpCoda, 23robVariation, 50addLR, 5, 7, 14, 15addLRinv, 6, 6, 14, 15aDist, 8adjust, 9adtest, 10, 12adtestWrapper, 11, 11, 46, 52aggr, 34alr, 658


INDEX 59arcticLake, 13cenLR, 7, 13, 15, 35cenLRinv, 7, 14, 15clo, 17coffee, 16constSum, 16daFisher, 17, 46expenditures, 19expendituresEU, 20geometricmean, 21gm, 21haplogroups, 21ilr, 30ilrInv, 7ilrregression (lmCoDaX), 32impAll, 22impCoda, 10, 23, 23, 26, 43, 47, 53impKNNa, 25, 25, 43, 47, 53impRZalr, 27, 29impRZilr, 23, 28, 28isomLR, 6, 8, 12, 14, 15, 25, 30, 31, 35, 37isomLRinv, 7, 14, 15, 30, 31print.outCoDa, 48print.pcaCoDa, 38, 49robCompositions(robCompositions-package), 3robCompositions-package, 3robilrregression (lmCoDaX), 32robVariation, 50scatterplot.matrix, 43skyeLavas, 51summary.adtestWrapper, 46, 51summary.imp, 52ternary, 54ternaryDiag, 53, 55–57ternaryDiagAbline, 55ternaryDiagEllipse, 56ternaryDiagLines (ternaryDiagPoints), 57ternaryDiagPoints, 57variation, 50zeroPatterns (missPatterns), 34Linda, 18lm, 33lmCoDaX, 32machineOperators, 33mad, 50missPatterns, 34orthbasis, 35outCoDa, 36, 44, 48par, 41, 42pcaCoDa, 37, 45, 49pfa, 38phd, 40plot.imp, 41plot.outCoDa, 43plot.pcaCoDa, 38, 44, 49print.adtestWrapper, 45print.daFisher, 46print.imp, 47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!