TECHNOMETRICS ?, VOL. 21, NO. 2, MAY 1979A <strong>Probability</strong> <strong>Distribution</strong> <strong>and</strong> <strong>Its</strong> <strong>Uses</strong> <strong>in</strong> Fitt<strong>in</strong>g<strong>Data</strong>John S. RambergSystems <strong>and</strong> Industrial Eng<strong>in</strong>eer<strong>in</strong>gThe University of ArizonaTucson, AZ 85721P<strong>and</strong>u R. TadikamallaGraduate School of Bus<strong>in</strong>essThe University of PittsburghPittsburgh, PA 15260Edward J. DudewiczDepartment of StatisticsThe Ohio State UniversityColumbus, OH 43210Edward F. MykytkaSystems <strong>and</strong> Industrial Eng<strong>in</strong>eer<strong>in</strong>gThe University of ArizonaTucson, AZ 85721A four-parameter probability distribution, which <strong>in</strong>cludes a wide variety of curve shapes, ispresented. Because of the flexibility, generality, <strong>and</strong> simplicity of the distribution, it is useful <strong>in</strong>the representation of data when the underly<strong>in</strong>g model is unknown. A table based on the firstfour moments, which simplifies parameter estimation, is given. Further important applicationsof the distribution <strong>in</strong>clude the model<strong>in</strong>g <strong>and</strong> subsequent generation of r<strong>and</strong>om variates forsimulation studies <strong>and</strong> Monte Carlo sampl<strong>in</strong>g studies of the robustness of statistical procedures.KEY WORDS<strong>Data</strong> fitt<strong>in</strong>g<strong>Probability</strong> distributionSystems of probability distributionsMomentsR<strong>and</strong>om variate generationMonte CarloSimulation1. INTRODUCTIONReasons for fitt<strong>in</strong>g a distribution to a set of datahave been summarized by Hahn <strong>and</strong> Shapiro [6] (p.195) as: the desire for objectivity, the need for automat<strong>in</strong>gthe data analysis, <strong>and</strong> <strong>in</strong>terest <strong>in</strong> the values ofthe distribution parameters. Although various empiricaldistributions already exist, e.g., the Pearson system<strong>and</strong> the Johnson system (see Chapter 7 of Hahn<strong>and</strong> Shapiro [6]) <strong>and</strong> the Burr distribution [1], we arepresent<strong>in</strong>g another distribution because of its simplicity,flexibility, <strong>and</strong> generality.The new distribution is a generalization of Tukey's[16] lambda distribution. It was developed by Ramberg<strong>and</strong> Schmeiser [9, 10] for the purpose of gener-at<strong>in</strong>g r<strong>and</strong>om variates for Monte Carlo simulationstudies because of the simple form of the result<strong>in</strong>galgorithm. (See (2).) A wide variety of curve shapesare possible with this distribution. Hence it is usefulfor the representation of data when the underly<strong>in</strong>gmodel is unknown. Silver [14], for example, showshow the distribution can be used to approximate thesafety factor <strong>in</strong> an <strong>in</strong>ventory control model. It is alsouseful <strong>in</strong> Monte Carlo studies of the robustness ofstatistical procedures <strong>and</strong> for sensitivity analyses <strong>in</strong>simulation studies.To illustrate the distribution, consider the histogramfor 250 sample measurements of the coefficientof friction of a metal [6] (p. 219) given <strong>in</strong> Figure 1.The superimposed distribution was fitted by themethods described <strong>in</strong> this paper. This example isdiscussed further <strong>in</strong> Section 5.In Section 2 some of the properties of the distributionare described. Section 3 conta<strong>in</strong>s a discussion ofthe use of the method of moments for fitt<strong>in</strong>g thedistribution to data <strong>and</strong> a table to facilitate this procedure.Table construction <strong>and</strong> accuracy is described<strong>in</strong> Section 4.Received August 1976; revised May 1978W<strong>in</strong>ner of 1977 Shewell Award at ASQC Chemical DivisionTechnical Conference2012. THE PROPOSED DISTRIBUTION AND ITS PROPERTIESA cont<strong>in</strong>uous probability distribution is usuallydef<strong>in</strong>ed by its distribution function or by its densityfunction. Alternatively it can be def<strong>in</strong>ed by its per-
202RAMBERG, TADIKAMALLA, DUDEWICZ AND MYKYTKA2724021I.-> 18COal 150L 120Z 9LdILi3- Z Pi 1 1I I I I I I0.0 0.01 0.02 0.03 0.04 0.05COEFFICIENT OF FRICTION0.06 0.07FIGURE 1. Coefficient of friction relative frequency histogram <strong>and</strong> the fitted distribution.centile (or quantile) function, if the percentile functionexists. The percentile function is simply the <strong>in</strong>verseof the distribution function. This concept isparticularly useful <strong>in</strong> Monte Carlo simulation studiesbecause of the follow<strong>in</strong>g result: If X is a cont<strong>in</strong>uousr<strong>and</strong>om variable with percentile function R, <strong>and</strong> U isa uniform r<strong>and</strong>om variable on the <strong>in</strong>terval zero toone, then the transformation X = R(U) yields a r<strong>and</strong>omvariable with the percentile function R.A specific example is Tukey's [16] lambda functionR(p) = [p - (l - p)X]/ (O?< p ?l1), (1)which is def<strong>in</strong>ed for all nonzero lambda values. (As X-- 0, the logistic distribution results.) Van Dyke [17]compared a normalized version of this function withStudent's t distribution. Filliben [5] used this distributionto approximate symmetric distributions with awide range of tail weights for study<strong>in</strong>g location estimationproblems of symmetric distributions. He alsogave a very complete discussion of the properties ofthe percentile function. Jo<strong>in</strong>er <strong>and</strong> Rosenblatt [7]studied the lambda distribution further <strong>and</strong> gave resultson the sample range. Ramberg <strong>and</strong> Schmeiser[9] showed how this distribution could be used toapproximate many of the well-known symmetric distributions<strong>and</strong> explored its application to MonteCarlo simulation studies.Ramberg <strong>and</strong> Schmeiser [10] generalized (1) to afour-parameter distribution def<strong>in</strong>ed by the percentilefunctionR(p) = A + [pa - (1 - p4]/A2 (0 < p < 1), (2)where X, is a location parameter, ,A is a scale parameter<strong>and</strong> A3 <strong>and</strong> A4 are shape parameters. This distribution,which <strong>in</strong>cludes the orig<strong>in</strong>al lambda distribution,also permits skewed curves to be represented. Althoughthe distribution function does not exist <strong>in</strong>"simple closed form," this should not be of concernto practitioners s<strong>in</strong>ce the same is true of the normaldistribution (whose percentiles are not nearly so easilycomputed). Another asymmetric generalization of(1) was considered by Ramberg [11].The density function correspond<strong>in</strong>g to (2) is givenby:f(x) = f[R(p)]- X2[X3pX3-1 + X4(1 -p)4-1]-i(0O
- Page 1: American Society for QualityA Proba
- Page 5 and 6: 204RAMBERG, TADIKAMALLA, DUDEWICZ A
- Page 7 and 8: 206RAMBERG, TADIKAMALLA, DUDEWICZ A
- Page 9 and 10: 208RAMBERG, TADIKAMALLA, DUDEWICZ A
- Page 11 and 12: 210RAMBERG, TADIKAMALLA, DUDEWICZ A
- Page 13 and 14: 212 RAMBERG, TADIKAMALLA, DUDEWICZ