20.04.2013 Views

Documentation of the Evaluation of CALPUFF and Other Long ...

Documentation of the Evaluation of CALPUFF and Other Long ...

Documentation of the Evaluation of CALPUFF and Other Long ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

performing model <strong>the</strong> most <strong>of</strong>ten scoring best in 4 <strong>of</strong> <strong>the</strong> 11 statistics (36% <strong>of</strong> <strong>the</strong> time).<br />

SCIPUFF, FLEXPART <strong>and</strong> CAMx all scored best with 2 <strong>of</strong> <strong>the</strong> 11 statistics (18%) with <strong>CALPUFF</strong><br />

scoring best for just one statistical metric.<br />

In testing <strong>the</strong> efficacy <strong>of</strong> <strong>the</strong> RANK statistic, overall rank across all eleven statistics was used to<br />

come up with an average modeled ranking to compare with <strong>the</strong> RANK statistic rankings. The<br />

average rank across all 11 performance statistics <strong>and</strong> <strong>the</strong> RANK rankings are as follows:<br />

Ranking Average <strong>of</strong> 11<br />

Statistics<br />

RANK<br />

1. CAMx CAMx<br />

2. HYSPLIT HYSPLIT<br />

3. SCIPUFF CALGRID<br />

4. FLEXPART SCIPUFF<br />

5. <strong>CALPUFF</strong> FLEXPART<br />

6. CALGRID <strong>CALPUFF</strong><br />

The results from CAPTEX Release 5 present an interesting case study on <strong>the</strong> use <strong>of</strong> <strong>the</strong> RANK<br />

metric to characterize overall model performance. As noted in Table C‐6 <strong>and</strong> given above, <strong>the</strong><br />

relative ranking <strong>of</strong> models using <strong>the</strong> average rankings across <strong>the</strong> 11 statistical metrics is<br />

considerably different than <strong>the</strong> RANK scores after <strong>the</strong> two highest ranked models. Both<br />

approaches rank CAMx as <strong>the</strong> best <strong>and</strong> HYSPLIT as <strong>the</strong> next best performing models for CTEX5,<br />

with rankings that are fairly close to each o<strong>the</strong>r. However, after that <strong>the</strong> two ranking<br />

techniques come to different conclusions regarding <strong>the</strong> ability <strong>of</strong> <strong>the</strong> models to simulate <strong>the</strong><br />

observed tracer concentrations for <strong>the</strong> CTEX5 field experiment.<br />

The most noticeable feature <strong>of</strong> <strong>the</strong> RANK metric for ranking models in CTEX5 is <strong>the</strong> third highest<br />

ranking model using RANK, CALGRID (1.57). CALGRID ranks as <strong>the</strong> worst or second worst<br />

performing model in 9 <strong>of</strong> <strong>the</strong> 11 performance statistics (82% <strong>of</strong> <strong>the</strong> time) <strong>and</strong> have an average<br />

ranking <strong>of</strong> 5.0, which means on average it is <strong>the</strong> 5 th best performing model out <strong>of</strong> 6. In<br />

examining <strong>the</strong> contribution to <strong>the</strong> RANK metric for CALGRID, <strong>the</strong>re is not a consistent<br />

contribution from all four broad categories to <strong>the</strong> composite score (Figure C‐40). Recall from<br />

equation 2‐12 in Section 2.4.3.2 that <strong>the</strong> RANK score is defined by <strong>the</strong> contribution <strong>of</strong> <strong>the</strong> four<br />

<strong>of</strong> <strong>the</strong> 11 statistics that represent measures <strong>of</strong> correlation/scatter (R2), bias (FB), spatial (FMS)<br />

<strong>and</strong> cumulative distribution:<br />

( 1−<br />

FB / 2 ) + FMS / 100+<br />

( 1 KS / 100)<br />

2<br />

RANK = R +<br />

−<br />

The majority <strong>of</strong> CALGRID’s 1.57 RANK score comes from fractional bias <strong>and</strong> Kolmogorov‐<br />

Smirnov parameter values. Recall from Figures C‐36 <strong>and</strong> C‐39 that <strong>the</strong> FOEX <strong>and</strong> FB metrics<br />

indicate that CALGRID consistently underestimates. The FB component to <strong>the</strong> composite score<br />

for CALGRID is one <strong>of</strong> <strong>the</strong> highest among <strong>the</strong> six models in this study, yet <strong>the</strong> underlying<br />

statistics indicate both marginal spatial skill <strong>and</strong> a degree <strong>of</strong> under‐prediction (likely due to <strong>the</strong><br />

spatial skill <strong>of</strong> <strong>the</strong> model).<br />

The current form <strong>of</strong> <strong>the</strong> RANK score uses <strong>the</strong> absolute value <strong>of</strong> <strong>the</strong> fractional bias. This<br />

approach weights underestimation equally to overestimation. However, in a regulatory<br />

context, EPA is most concerned with models not being biased towards underestimation. When<br />

looking at all <strong>of</strong> <strong>the</strong> performance statistics, CALGRID is clearly one <strong>of</strong> <strong>the</strong> worst performing LRT<br />

41

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!