D5 Annex report WP 3: ETIS Database methodology ... - ETIS plus

More documents

Recommendations

Info

D5 Annex WP 3: DATABASE METHODOLOGY AND DATABASE USER MANUAL – FREIGHT TRANSPORT DEMAND The COMEXT database for the Year 2000 has been used for the testing phase of ETIS. The first test has considered whether it is necessary to apply additional error checking routines to avoid the inclusion of data errors in the subsequent O/D matrices. This has been achieved by comparing ‘smoothed’ data to the raw data to measure the impact of erratic values in the database. A technique for identifying outliers (errors or “erratics”) has been developed within the MDS Transmodal trade forecasting model, taking into account a long time series of trade data. During this process, the COMEXT data is converted into time series vectors for individual trade flows (e.g. French exports of SITC 56 in tonnes to Italy) for quarterly time periods covering approximately fifteen years. The smoothing software samples four data points for each year and calculates the mean and the standard deviation for that year. It then compares each year's mean and standard deviation with all the others. Then if there are any years with unusual levels of variance, they are investigated by the software, and according to certain thresholds individual quarterly values may be marked as outliers and the software will replace them with interpolated values. It means that normally erratic series will be left untouched, but erratic points or sequences within normally stable series will be changed. Every year, new data is collected, and the process is repeated, so it is possible that what is regarded as an outlier may change over time as the software learns more about the time series. The use of the smoothing algorithm can be illustrated, by comparing the smoothed data to the original data. In 2000, imports into EU countries amounted to 2.501 billion tonnes according to COMEXT. After smoothing the estimate was 2.391 billion tonnes, a change of only 4%. The largest absolute error in a single 2 digit SITC category is 13 million tonnes, for SITC 33, petroleum. However this is only a 2% difference within that category. The largest percentage error is for SITC 83, travel goods, with a 63% percent difference. However this only amounts to an absolute difference of 1.133 million tonnes. Most of the difference can be traced to a figure of 1.079 million tonnes for travel goods between the UK and Germany. Looking at the same trade flow using German export data a total of 0.001 million tonnes can be found, a level that agrees more readily with the smoothed data. This difference (1000 times) is untypical however. Absolute differences are typically about 1 million tonnes per 2 digit SITC category, and relative differences are typically about 6%. It should also be noted that differences between smoothed and unsmoothed series do not necessarily imply that the unsmoothed series contains errors, only values that are unlikely to be repeated. At these levels, and given the scope of ETIS, the potential impacts of measurement errors are not alarming, particularly when the annual version of COMEXT is used. However, the example related above does suggest that a high level comparison between the trade data used within ETIS and the data based upon a smoothed quarterly time series will reveal a small number of 44 Document2 27 May 2004
D5 Annex WP 3: DATABASE METHODOLOGY AND DATABASE USER MANUAL – FREIGHT TRANSPORT DEMAND important differences that can be corrected manually. The ability to compare counterflows in intra EU data is also useful in this context. Conversion of commodity code The COMEXT database is published using the international 8 digit combined nomenclature (CN8) system. This has the advantage that the flows can be readily and unambiguously converted into other (more aggregated) systems such as the Standard International Trade Classification (SITC) and Standard Goods Classification for Transport Statistics/Revised (NST/R). Conversion tables are published on EUROSTAT’s classification server. See http://europa.eu.int/comm/eurostat/ramon/. Selection between import and export registration The Comext data contains a for extraEU trade one registration, the registration of import or export of the EU country. For intraEU trade the Comext data contains two registrations for the same flow, once registered as export of the origin country and once registered as import of the destination country. In an ideal case, the transport volumes in both registrations are the same. Unfortunately, in many cases the registrations are not the same. The example described in the table below illustrates this. Table 6.2 Origin country Example of differences in registration for the same flow Destination country Commodity Registration Transport volume in tonnes France The Netherlands Cereals Import registration the Netherlands 3611453 France The Netherlands Cereals Export registration France 4252118 In this example the export registration is about 640.000 tonnes higher than the import registration. In the database only one value will be included for the trade flow of cereals from France to the Netherlands, thus these two registrations have to be converted in a single registration. The trade flows are registered according to the INTRASTAT system. The rules that have to be followed in the INTRASTAT system give no indication that the import or the export registration is more reliable (in the past the import registration was considered to be more reliable). Since it cannot be decided what registration is more reliable, it is decided that both registrations are even reliable and therefore the average value of the import and the export registration is taken as the transport volume on this relation (in the example above the transport volume becomes 3931786 tonnes). In order to keep information about the difference between the import and the export registration, two variables are added to the data. One variable indicates whether the difference between the import and the export registration is more than 500.000 tonnes (in the example given above this is the case), another variable indicates the relative difference between the average value and the import and export registrations (in the example above this percentage is 8%, indicating that the transport volume could actually be 8% lower or 8% higher). These indicators for the difference between import and export registration will be used in the second method for the identification of confusion between trade and transport. Document2 27 May 2004 45
Page 1: D5 Annex report WP 3: ETIS Databa
Page 4 and 5: D5 Annex WP 3: DATABASE METHODOLOGY
Page 7: D5 Annex WP 3: DATABASE METHODOLOGY
Page 95 and 96:
D5 Annex WP 3: DATABASE METHODOLOGY
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133:
ANNEX C: EXPERIENCE FROM CONCERTED
Page 136 and 137:
Page 138 and 139:
Page 140 and 141:
Page 142 and 143:
Page 144 and 145:
Page 146 and 147:
Page 148 and 149:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175:
Page 179 and 180:
show all

D5 Annex report WP 3: ETIS Database methodology ... - ETIS plus

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?