Documentation of the Evaluation of CALPUFF and Other Long ...

Documentation of the Evaluation of CALPUFF and Other Long ... Documentation of the Evaluation of CALPUFF and Other Long ...

20.04.2013 Views

HYSPLIT). Both approaches show CAMx and HYSPLIT as the highest ranking models for CTEX5 with rankings that are fairly close to each other, however after that the two ranking techniques come to very different conclusions regarding the ability of the models to simulate the observed tracer concentrations for the CTEX5 field experiment. The most noticeable feature of the RANK metric for ranking models in CTEX5 is the third highest ranking model using RANK, CALGRID (1.57). CALGRID ranks as the worst or second worst performing model in 9 of the 11 performance statistics, so is one of the worst performing model 82% of the time and has an average ranking of 5 th best model out of the 6 LRT dispersion models. In examining the contribution to the RANK metric for CALGRID, there is not a consistent contribution from all four broad categories to the composite scores (Figure ES‐5). As noted in Table ES‐2, the RANK score is defined by the contribution of the four of the 11 statistics that represent measures of correlation/scatter (R 2 ), bias (FB), spatial (FMS) and cumulative distribution (KS): ( 1− FB / 2 ) + FMS / 100+ ( 1 KS / 100) 2 RANK = R + − The majority of CALGRID’s 1.57 RANK score comes from the fractional bias (FB) and Kolmogorov‐Smirnov (KS) performance statistics with little or no contributions from the correlation (R 2 ) or spatial (FMS) statistics. As shown in Table ES‐6, CALGRID performs very poorly for the FOEX and FA2/FA5 statistics due to a large underestimation bias. The FB component to the RANK composite score for CALGRID is one of the highest among the six models in this study, yet the underlying statistics indicate both marginal spatial skill and a large degree of under‐prediction (likely due to the spatial skill of the model). The current form of the RANK score uses the absolute value of the fractional bias. This approach weights underestimation equally to overestimation. However, in a regulatory context, EPA is most concerned with models not being biased towards under‐prediction. Models can produce seemingly good (low) bias metrics through compensating errors by averaging over‐ and under‐predictions. The use of an error statistic (e.g., NMSE) instead of a bias statistic (i.e., FB) in the RANK composite metrics would alleviate this problem. Adaptation of RANK score for regulatory use will require refinement of the individual components to insure that this situation does not develop and to insure that the regulatory requirement of bias be accounted for when weighting the individual statistical measures to produce a composite score. 22

Table ES‐6. Summary of model rankings using the statistical performance metrics and comparison with the RANK metric. Statistic 1 st 2 nd 3 rd 4 th 5 th 6 th FMS SCIPUFF CAMx HYSPLIT CALPUFF FLEXPART CALGRID FAR FLEXPART HYSPLIT CAMx SCIPUFF CALGRID CALPUFF POD SCIPUFF CAMx HYSPLIT FLEXPART CALPUFF CALGRID TS FLEXPART HYSPLIT CAMx SCIPUFF CALPUFF CALGRID FOEX CALPUFF CAMx HYSPLIT CALGRID SCIPUFF FLEXPART FA2 HYSPLIT CAMx CALPUFF SCIPUFF FLEXPART CALGRID FA5 HYSPLIT CAMx SCIPUFF CALPUFF FLEXPART CALGRID NMSE CAMx SCIPUFF FLEXPART HYSPLIT CALPUFF CALGRID PCC or R HYSPLIT CAMx SCIPUFF FLEXPART CALGRID CALPUFF FB CAMx CALGRID FLEXPART SCIPUFF HYSPLIT CALPUFF KS HYSPLIT CALPUFF CALGRID CAMx FLEXPART SCIPUFF Avg. Ranking CAMx HYSPLIT SCIPUFF FLEXPART CALPUFF CALGRID Avg. Score 2.20 2.4 3.4 3.8 4.3 5.0 RANK Ranking CAMx HYSPLIT CALGRID SCIPUFF FLEXPART CALPUFF RANK 1.91 1.80 1.57 1.53 1.45 1.28 European Tracer Experiment (ETEX) The European Tracer Experiment (ETEX) was conducted in 1994 with two tracer releases from northwest France that was measured at 168 samplers located in 17 European countries. Five LRT dispersion models were evaluated for the first (October 23, 1994) ETEX tracer release period (CALPUFF, SCICHEM, HYSPLIT, FLEXPART and CAMx). All five LRT dispersion models were exercised using a common 36 km MM5 database for their meteorological inputs. For CALPUFF, the MMIF tool was used to process the MM5 data. Default model options were mostly selected for the LRT dispersion models. An exception to this is that for CALPUFF puff splitting was allowed to occur throughout the day, instead of once per day which is the default setting. The MM5 simulation was evaluated using surface meteorological variables. The MM5 performance did not always meet the model performance benchmarks and exhibited a wind speed and temperature underestimation bias. However, since all five LRT dispersion models used the same MM5 fields, this did not detract from the LRT model performance intercomparison. The ATMES‐II model evaluation approach was used in the evaluation that calculated 12 model performance statistics of spatial, scatter, bias, correlation and cumulative distribution. ETEX LRT Dispersion Model Performance Evaluation Figure ES‐6 displays the ranking of the five LRT dispersion models using the RANK model performance statistic with Table ES‐7 summarizing the rankings for the other 11 ATMES‐II performance statistics. Depending on the statistical metric, three different models were ranked as the best performing model for a particular statistic with CAMx being ranked first most of the time (64%) and HYSPLIT ranked first second most (27%). In order to come up with an overall rank across all eleven statistics we average the modeled ranking order to come up with an average ranking that listed CAMx first, HYSPLIT second, SCIPUFF third, FLEXPART fourth and CALPUFF the fifth. This is the same ranking as produced by the RANK integrated statistics that combines the four statistics for correlation (PCC), bias (FB), spatial (FMS) and cumulative distribution (KS), giving credence that the RANK statistic is a potentially useful performance 23

Table ES‐6. Summary <strong>of</strong> model rankings using <strong>the</strong> statistical performance metrics <strong>and</strong><br />

comparison with <strong>the</strong> RANK metric.<br />

Statistic 1 st 2 nd 3 rd 4 th 5 th 6 th<br />

FMS SCIPUFF CAMx HYSPLIT <strong>CALPUFF</strong> FLEXPART CALGRID<br />

FAR FLEXPART HYSPLIT CAMx SCIPUFF CALGRID <strong>CALPUFF</strong><br />

POD SCIPUFF CAMx HYSPLIT FLEXPART <strong>CALPUFF</strong> CALGRID<br />

TS FLEXPART HYSPLIT CAMx SCIPUFF <strong>CALPUFF</strong> CALGRID<br />

FOEX <strong>CALPUFF</strong> CAMx HYSPLIT CALGRID SCIPUFF FLEXPART<br />

FA2 HYSPLIT CAMx <strong>CALPUFF</strong> SCIPUFF FLEXPART CALGRID<br />

FA5 HYSPLIT CAMx SCIPUFF <strong>CALPUFF</strong> FLEXPART CALGRID<br />

NMSE CAMx SCIPUFF FLEXPART HYSPLIT <strong>CALPUFF</strong> CALGRID<br />

PCC or R HYSPLIT CAMx SCIPUFF FLEXPART CALGRID <strong>CALPUFF</strong><br />

FB CAMx CALGRID FLEXPART SCIPUFF HYSPLIT <strong>CALPUFF</strong><br />

KS HYSPLIT <strong>CALPUFF</strong> CALGRID CAMx FLEXPART SCIPUFF<br />

Avg.<br />

Ranking<br />

CAMx HYSPLIT SCIPUFF FLEXPART <strong>CALPUFF</strong> CALGRID<br />

Avg. Score 2.20 2.4 3.4 3.8 4.3 5.0<br />

RANK<br />

Ranking<br />

CAMx HYSPLIT CALGRID SCIPUFF FLEXPART <strong>CALPUFF</strong><br />

RANK 1.91 1.80 1.57 1.53 1.45 1.28<br />

European Tracer Experiment (ETEX)<br />

The European Tracer Experiment (ETEX) was conducted in 1994 with two tracer releases from<br />

northwest France that was measured at 168 samplers located in 17 European countries. Five<br />

LRT dispersion models were evaluated for <strong>the</strong> first (October 23, 1994) ETEX tracer release<br />

period (<strong>CALPUFF</strong>, SCICHEM, HYSPLIT, FLEXPART <strong>and</strong> CAMx). All five LRT dispersion models were<br />

exercised using a common 36 km MM5 database for <strong>the</strong>ir meteorological inputs. For <strong>CALPUFF</strong>,<br />

<strong>the</strong> MMIF tool was used to process <strong>the</strong> MM5 data. Default model options were mostly selected<br />

for <strong>the</strong> LRT dispersion models. An exception to this is that for <strong>CALPUFF</strong> puff splitting was<br />

allowed to occur throughout <strong>the</strong> day, instead <strong>of</strong> once per day which is <strong>the</strong> default setting. The<br />

MM5 simulation was evaluated using surface meteorological variables. The MM5 performance<br />

did not always meet <strong>the</strong> model performance benchmarks <strong>and</strong> exhibited a wind speed <strong>and</strong><br />

temperature underestimation bias. However, since all five LRT dispersion models used <strong>the</strong><br />

same MM5 fields, this did not detract from <strong>the</strong> LRT model performance intercomparison. The<br />

ATMES‐II model evaluation approach was used in <strong>the</strong> evaluation that calculated 12 model<br />

performance statistics <strong>of</strong> spatial, scatter, bias, correlation <strong>and</strong> cumulative distribution.<br />

ETEX LRT Dispersion Model Performance <strong>Evaluation</strong><br />

Figure ES‐6 displays <strong>the</strong> ranking <strong>of</strong> <strong>the</strong> five LRT dispersion models using <strong>the</strong> RANK model<br />

performance statistic with Table ES‐7 summarizing <strong>the</strong> rankings for <strong>the</strong> o<strong>the</strong>r 11 ATMES‐II<br />

performance statistics. Depending on <strong>the</strong> statistical metric, three different models were ranked<br />

as <strong>the</strong> best performing model for a particular statistic with CAMx being ranked first most <strong>of</strong> <strong>the</strong><br />

time (64%) <strong>and</strong> HYSPLIT ranked first second most (27%). In order to come up with an overall<br />

rank across all eleven statistics we average <strong>the</strong> modeled ranking order to come up with an<br />

average ranking that listed CAMx first, HYSPLIT second, SCIPUFF third, FLEXPART fourth <strong>and</strong><br />

<strong>CALPUFF</strong> <strong>the</strong> fifth. This is <strong>the</strong> same ranking as produced by <strong>the</strong> RANK integrated statistics that<br />

combines <strong>the</strong> four statistics for correlation (PCC), bias (FB), spatial (FMS) <strong>and</strong> cumulative<br />

distribution (KS), giving credence that <strong>the</strong> RANK statistic is a potentially useful performance<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!