D5 Annex report WP 3: ETIS Database methodology ... - ETIS plus
D5 Annex report WP 3: ETIS Database methodology ... - ETIS plus
D5 Annex report WP 3: ETIS Database methodology ... - ETIS plus
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>D5</strong> <strong>Annex</strong> <strong>WP</strong> 3: DATABASE METHODOLOGY AND DATABASE USER MANUAL –<br />
FREIGHT TRANSPORT DEMAND<br />
The COMEXT database for the Year 2000 has been used for the testing phase of <strong>ETIS</strong>. The<br />
first test has considered whether it is necessary to apply additional error checking routines to<br />
avoid the inclusion of data errors in the subsequent O/D matrices. This has been achieved by<br />
comparing ‘smoothed’ data to the raw data to measure the impact of erratic values in the<br />
database.<br />
A technique for identifying outliers (errors or “erratics”) has been developed within the MDS<br />
Transmodal trade forecasting model, taking into account a long time series of trade data.<br />
During this process, the COMEXT data is converted into time series vectors for individual trade<br />
flows (e.g. French exports of SITC 56 in tonnes to Italy) for quarterly time periods covering<br />
approximately fifteen years. The smoothing software samples four data points for each year and<br />
calculates the mean and the standard deviation for that year. It then compares each year's mean<br />
and standard deviation with all the others. Then if there are any years with unusual levels of<br />
variance, they are investigated by the software, and according to certain thresholds individual<br />
quarterly values may be marked as outliers and the software will replace them with interpolated<br />
values. It means that normally erratic series will be left untouched, but erratic points or<br />
sequences within normally stable series will be changed. Every year, new data is collected, and<br />
the process is repeated, so it is possible that what is regarded as an outlier may change over time<br />
as the software learns more about the time series.<br />
The use of the smoothing algorithm can be illustrated, by comparing the smoothed data to the<br />
original data.<br />
In 2000, imports into EU countries amounted to 2.501 billion tonnes according to COMEXT.<br />
After smoothing the estimate was 2.391 billion tonnes, a change of only 4%. The largest<br />
absolute error in a single 2 digit SITC category is 13 million tonnes, for SITC 33, petroleum.<br />
However this is only a 2% difference within that category. The largest percentage error is for<br />
SITC 83, travel goods, with a 63% percent difference. However this only amounts to an<br />
absolute difference of 1.133 million tonnes. Most of the difference can be traced to a figure of<br />
1.079 million tonnes for travel goods between the UK and Germany.<br />
Looking at the same trade flow using German export data a total of 0.001 million tonnes can be<br />
found, a level that agrees more readily with the smoothed data. This difference (1000 times) is<br />
untypical however. Absolute differences are typically about 1 million tonnes per 2 digit SITC<br />
category, and relative differences are typically about 6%. It should also be noted that<br />
differences between smoothed and unsmoothed series do not necessarily imply that the unsmoothed<br />
series contains errors, only values that are unlikely to be repeated.<br />
At these levels, and given the scope of <strong>ETIS</strong>, the potential impacts of measurement errors are<br />
not alarming, particularly when the annual version of COMEXT is used. However, the example<br />
related above does suggest that a high level comparison between the trade data used within<br />
<strong>ETIS</strong> and the data based upon a smoothed quarterly time series will reveal a small number of<br />
44<br />
Document2<br />
27 May 2004