Differences

This shows you the differences between two versions of the page.

--- the_regionalised_data_base_capreg [2020/02/07 11:30] – [Methodology applied in the regional data consolidation] matsz
+++ the_regionalised_data_base_capreg [2022/11/07 10:23] (current) – external edit 127.0.0.1
@@ Line 61: / Line 61: @@
 === Activity Levels===
+In cases where data on regional activity levels are missing, a linear trend line is estimated for regional and Member State time series in the definition of the regional database. The gap is then filled with a weighted average between the trend line – using a weight of R² - and a weighted average of the available observations around the gap, using a weight of 1-R². The specific formulation has the following properties. In cases of a strong trend in a time series, the back-casted and forecasted numbers will be dominated by the trend as the weight of R² will be high. With decreasing R², the estimated values will be pulled towards known values.
+Apart from gap filling another problem is that in annual cropland statistics at the regional level only cover a few crop activities (cereals with wheat, barley, grain maize, rice; potatoes, sugar beet, oil seeds with rape and sunflower; tobacco, fodder maize; grassland, permanent crops with vineyards and olive plantations). The COCO data base, however, covers some 30 different crop activities. In order to break these aggregates down to COCO definitions, the national shares of the aggregate are used.
+As an example, this approach is explained for cereals. Data on the production activities WHEA (wheat = SWHE+DWHE), BARL (barley), MAIZ (grain maize) and PARI (paddy rice) as found in COCO match directly the level of disaggregation in the regional data. Therefore, the mapped regionalized data are directly set equal to the corresponding values in the regional “raw” data. The difference between the sum of these 4 activities and the aggregate data on cereals in the regional raw data must be equal to the sum of the remaining activities in cereals as shown in COCO, namely RYE (rye and meslin), OATS (oats) and OCER (other cereals). As long as no other regional information is available, this difference from the regional raw data is hence broken down applying national shares.
+The approach is shown for OATS in the following equations, where the suffix r stands for regional data:
+\begin{align}
+\begin{split}
+LEVL_{OATS,r} &= (CEREAL_r\\
+&\quad -WHEAT_r-BARLEY_r-MAIZEGR_r-RICE_r)\cdot\\
+&\quad\frac{LEVL_{OATS,COCO}}{(LEVL_{OATS,COCO}+LEVL_{RYE,COCO}+LEVL_{OCER,COCO})}
+\end{split}
+\end{align}
+Similar equations are used to break down other aggregates and residual areas in the regional data ((If no data at all are found, the share on the utilisable agricultural area is used.)). The Farm Structure Survey (FSS) provides crop areas for a larger number of crops but this survey is usually conducted only every three years. Data from FSS, when available, is also used to aproximate crop areas at regional level.
+One important advantage of the approach is the fact that the resulting areas are automatically consistent to the national data if the ingoing information from REGIO was consistent to national level. Fortunately, the regional information on herd sizes covers most of the data needed to give nice proxies for all animal activities in COCO definition. The regional data break down for herd sizes is often more detailed than COCO  at least for the important sectors. Regional estimates for the activity levels are therefore the result of an aggregation approach, in opposite to crop production.
+In order to generate good starting points for the following steps of data processing and to avoid systematic deviations between regional and national levels in the following consistency steps, all regional level in REGIO are first scaled with the relation between the (national) results in COCO and the regional results when aggregated to the national level (key file is gams\capreg\map_from_regio.gms).
+Besides technological plausibility and a good match with existing regional statistics, the regionalized data for the CAPRI model must be also consistent to the national level. The minimum requirement for this consistency includes activity levels and gross production. The “initialisation” of the regional database has been undertaken already to meet this requirement as good as possble but cannot guarantee it. Consistency for activity levels is therefore based on Highest Posterior Density Estimator which ensures (in gams\capreg\cons_levls.gms):
+  - Adding up of activity levels from lower regional level (NUTS II, NUTS I) to higher ones (NUTS I, NUTS 0)
+  - Adding up of crop areas to UAA at regional level.
+The objective function minimizes in case of animal herds simple squared relative deviations from the herds. In case of crops, a 25% weight for absolute squared difference of the crop shares on UAA plus 75% deviation of relative squared differences is introduced. In the crop sector consistency is also imposed to regional transition matrices for 6 UNFCCC land use categories relevant for carbon accounting (forest land, cropland, grassland, settlements, wetlands, residual land) which are initialised from the national transition matrix estimated in the COCO1 module.
+A specific problem is the fact that land use statistics do not report a break down of idling land into obligatory set aside, voluntary set aside and fallow land((The necessary additional information on non-food production on set-aside, obligatory and voluntary set-aside areas can be found on the DG-AGRI web server.)). Equally, the share of oilseeds grown as energy crops on set aside needs to be determined. An Highest Posterior density estimator is used (in gams\capreg\cal_seta.gms) to ‘distribute’ the national information on the different types of idling land to regional level, with the following restrictions:
+  * Obligatory set-aside areas must be equal to the set-aside obligations derived from areas and set-aside rates for Grandes Cultures (which may differ at regional level according to the share of small producers). For these crops, activity levels are partially endogenous in the estimation in order to allow a split up of oilseeds into those grown under the set-aside obligations and those grown as non-fo-od crops on set-aside.
+  * Obligatory and voluntary set-aside cannot exceed certain shares of crops subjects to set-aside (at least before Agenda 2000 policy)
+  * Fallow land must equalise the sum of obligatory set-aside, voluntary set-aside and other idling land.
+  * Total utilisable area must stay constant.
+In some cases, areas reported as fallow land are smaller than set-aside obligations. In these cases, parts of grassland areas and ‘other crops’ are allowed to be reduced.
+===Production and yields ===
+The proceedure for gross output (GROF) is similar to the one for activity levels, as correction factors are applied to line up regional yields with given national production:
+\begin{align}
+\begin{split}
+CORR_{GROF,o} &= \sum_{j,r}{Levl_{j,r}O_{j,r}}/GROF_{o,n}\\
+O_{j,r}^*&=O_{j,r} \cdot CORR_{GROF,o}
+\end{split}
+\end{align}
+In case of missing statistical information for regional yields, national yields are used. A special rule is used for fodder maize yields, where regional yields are derived from national fodder maize yields, and the relation between regional and national average cereal yields.
+For grassland and fodder from arable land, missing yields are derived from national ones using the relation between regional and national stocking densities of ruminants, in combination with assumed share of concentrates in terms of a weighted sum of energy and protein per ruminant activity in CAPRI. Those shares are then scaled with a uniform factor to exhaust on average the available energy and protein from concentrates at the national level. Accordingly, higher fodder yields are expected where ruminant stocking densities are high, acknowledging differences in concentrate shares. If e.g. the stocking densities solely stem from sheep and goat, the assumed impacts on yields is higher. In order to avoid unrealistic low or high yields, those are bounded to a 25%-400% range compared to the regional aggregate.
+The input allocation in any given year should not be linked to realised, but to expected yields. Expected yields are constructed using the following modified Hodrick-Prescott filter:
+\begin{equation}
+\text{min} \quad hp=1000 \sum_{1<t<T-1}({y_{t+1}^*-y_{t-1}^*})^2 + \sum_{t}({y_t^*-y_t})^2
+\end{equation}
+where y covers all output coefficients in the data base. The Hodrick-Prescott filter is applied both at the national and regional level after any gaps in the time series had been closed.
+====Final steps of regional data completion====
+The regional database modules also cover some aspects which are discussed in other parts of this documentation.
+  * For policy data at the regional level (mostly premium related data) see Section [[Policy data]]. These policy related assignments require a good part of the CAPREG module
+  * For the fertiliser and feed allocations and environmental indicators, also important elements of the regional database, see the next Section [[Input Allocation]]
+  * Towards the end of the regional data base consolidation supply side PMP parameters are calibrated as a final test of consistency and sometimes to serve as starting values for the subsequent baseline calibration (in //gams\capreg\pmp.gms//)
+====Build and compare time series of GHG inventories====
+The regionalised data base module CAPREG runs in two steps:
+  * The first steps prepares regional time series covering activities, production, land use and the fertiliser allocation
+  * The second step involves more time consuming processing steps which are therefore only executed for the selected base year: feed allocation, computation of GHG results, and the final calibration test
+To assess the reliability of the CAPRI database in terms of GHG results against official UNFCCC notifications, results from the first step (time series) were insufficient, as the GHG accounting also requires information on the feed allocation. This problem was addressed within the scope of the IDEAg (Improving the quantification of GHG emissions and flows of reactive nitrogen) project((The IDEAg project was commissioned by the JRC-IES in Ispra in 2015 and was carried out by the Thünen Institute in cooperation with the JRC-IES (August 2015 – August 2016). A more detailed explanation of the CAPRI task “Build GHG inventories” and its use has been prepared by the Thünen contributors at the time, Sandra Marquardt and Alexander Gocht, see capri\doc\GHG_inventory_module.docx. )), where an option has been introduced to allow for a consistent accounting of GHG emissions over time. This is able to combine input information from CAPREG time series runs as well as (short run, nowcasting-style) CAPMOD simulation results. Furthermore, an R-based tool was introduced to the CAPRI GUI that maps GHG emissions data from CAPRI to the GHG emission balances contained in the National Inventory Reports (NIRs) that are submitted annually by countries in compliance with UNFCCC GHG reporting obligations.