CAPRI Online Manual (update)

This is an old revision of the document!

Forecast tool CAPTRD

The tool providing projections for the European regions (the EU as of 2019, Turkey, Norway, Albania, Northern Macedonia, Montenegro, Bosnia and Herzegovina, Kosovo and Serbia) in CAPRI is called CAPTRD. It operates in several steps:

Step 1 involves independent trends on all series, providing initial forecasts and statistics on the goodness of fit or indirectly on the variability of the series.
Step 2 imposes constraints like identities (e.g. production = area * yield) or technical bounds (like non-negativity or maximum yields) and introduces specific expert information given on the MS level.
Step 3 includes expert information on aggregate EU markets, typically coming from the AgLink model or GLOBIOM. This external data is not available for all individual countries in CAPRI, but for larger regions. Therefore, several countries must be simultaneously estimated in order to ensure proper use of this important prior information.
Step 4 Depending on the aggregation level chosen the MS result may be disaggregated in subsequent steps to the regional level (NUTS2) or even to the level of farm types.

The trends estimated in CAPTRD are subject to consistency restrictions in steps 2 and 3. Hence they are not independent forecasts for each time series and the resulting estimator is hence a system estimator under constraints (e.g. closed area and market balances). Nonetheless, the trends are mechanical in the sense that they respect technological relationships but do not include any information about behavioural functions or policy developments¹⁾.

CAPTRD results are in turn only the first of several steps before a full CAPRI baseline is ready to use. The rest of this chapter focuses on CAPTRD.

Step 1: Independent, weighted nonlinear least squares

Before entering into the details it should be stated that ultimately almost any projection may be reduced to a particular type of trend projections, at least if the exogenous inputs, such as population, prices or household expenditure are also projected (usually by other research teams) as functions of time. In this sense trend projection may provide a firm ground on which to build projections and this is exactly their purpose in our work.

The first ingredient in the estimator is the trend curve itself which is defined as:

\begin{equation} X_{r,i,t}^{j,Trend}=a_{r,i,j}+b_{r,i,j}t^{c_{r,i,j}} \end{equation}

where the parameters a, b and c are to be estimated so that the squared deviation between given and estimated data are minimized. The X stands for the data and represents a five dimensional array, spanning up products i and items j (as feed use or production), regions r, points in time t and different data status such as ‘Trend’ or ‘Observed’. The trend curve itself is a kind of Box-Cox transformation, as parameter c is used as the exponent of the trend. For c equal unity, the resulting curve is a straight line, for c between 0 and 1, the curve is concave from below, i.e. increasing but with decreasing rates, whereas for c > 1, the curve is convex from below, i.e. increasing with increasing rates. In order to prevent differences between time points to increase sharply over the projection period, the parameters c are restricted to be below 1.2.

This form has the advantage of ensuring monotonic developments whereas quadratic trends often gave increasing yields for the first part of the projection period and afterwards a decrease. Another conclusion from the early explorations was that it is useful to define the trend variable \(t_{1984} = 0.1, t_{1985} = 0.2, t_{1986} = 0.3 \) etc., giving a potentially strong nonlinearity in the early years of the database (where the frequency of high changes, possibly due to data weaknesses was high) and a rather low nonlinearity in the projection period.

The ex-post period usually covers the period from 1985 towards the end of the underlying CAPREG output (file res_time_series.gdx) which is typically 4-6 years before the current year. The national level COCO data may have somewhat longer series than the regional CAPREG data. To account for this different availability of ex post data the following sets should be distinguished:

Expost: defined from the length of the series in CAPREG output res_time_series.gdx
Exante: covering any sequence of intermediate result years up to the user specified final year²⁾.
ExanteD: Ex ante years with additional COCO1 data (assigned in ‘captrd\ load_coco1_data.gms’)
ExpostT: Union of Expost and ExanteD = years with data for trend estimation

The estimator minimises the weighted sum of squares of errors using the trend variable as weights:

\begin{equation} wSSE_{r,i,j}=\sum_{expost} \left(X_{r,i,expost}^{j,"Data"}-a_{r,i,j}+b_{r,i,j}t_{expost}^{c_{r,i,j}}\right)^2t_{expost} \end{equation}

The weighting with the trend was introduced in the exploration phase based on the following considerations and experience. First of all, it reflects the fact that statistics from the early years (mid eighties) are often less reliable then those from later years. Secondly, even if they are reliable, older data will tend to contribute less useful information than more recent ones due to ongoing structural change. For this reason we have discarded any years before 1992 for the New MS, for example, but the data from the mid 90ies may nonetheless represent a situation of transition that should count less than the recent past. In technical terms the step 1 estimates are found by a grid search over selected values of parameter c with analytical OLS estimates for parameters a and b (see ‘captrd\estimate_trends.gms’) that have been found identical to those of the econometric package Eviews for a given value of c (holds also for wSSE).

Step 2.1: Consistency constraints in the trend projection tool

Step 2 adds the consistency conditions and thus transforms the naïve independent trends into a system estimator. In almost all cases, the unrestricted trend estimates from the first step would violate one or several of the consistency conditions. We want to find estimates that both fit into the consistency constraints and exploit the information comprised in the ex-post development in a technical feasible way. Consider the identity that defines production as hectares/herd sizes times yield. Running independent trend estimates for barley area, barley yield and barley production will almost certainly produce estimates where production is not equal to yield times area. One solution would be to drop one of the three estimates, say yield, and replace it instead by the division of forecasted production by forecasted area. However, by doing so, we throw away the information incorporated in the development of barley yield over time. Adding relations between time series hence helps us to exploit more information than is contained in single series.

When consolidating simultaneously the different Step1 estimates, we will minimise the squared deviations from values computed in Step 1, in the following called “supports”, while complying with all constraints. A risk is that shaky trends may give a forecast line with an end point far away from ex-post observations. Hence we need safeguards pulling our estimates to a ‘reasonable’ value in such cases.

The confidence interval from the Step 1 trend estimation will not help, as it will be centred around the last projection value and as it will simply be quite large in case of a bad R². However, we may use the idea underlying the usual test statistics for the parameters related to the trend (a,b,c). These statistics test the probability of (a,b,c) being significantly different from zero. It can be shown that these tests are directly related to R² of the regression. If the zero hypotheses would be true, i.e. if the estimated parameters would have a high probability of being zero, we would not use the trend line, but the mean of the series instead.

This reasoning is the basis for the supports derived from the Step 1 estimates in CAPTRD (‘captrd\define_stats_and_supports.gms’), after some modifications. First of all, we used a three-year average based on the last known values as the fallback position and not the mean of the series. Secondly, in typical econometric analysis, test statistics would only be reported for the final estimation layout, some variables would have been dropped from the regression beforehand if certain probability thresholds are undercut. For our applications, we opted for a continuous rule as the choice of threshold values is arbitrary. The smaller the weighted R² the stronger the estimates are drawn towards our H0 – the value is equal to the recent three year average:

\begin{equation} X_{r,i,exante}^{j,"Support"}=wR_{r,i,j}^2 \left( a_{r,i,j}+b{r,i,j}t_{exante}^{c_{r,i,j}}\right)+\left(1-wR_{r,i,j}^2\right)X_{r,i,bas}^{j,"Data"} \end{equation}

where \begin{equation} wR_{r,i,j}^2=1-wSSE_{r,i,j}/wSST_{r,i,j} \end{equation}

and the weighted total sum of squares is defined analogous to equation below: \begin{equation} wSST_{r,i,j}=\sum_{expost}\left(X_{r,i,expost}^{j,"Data"}-X_{r,i,wAve}^{j,"Data"}\right)^2t_{expost} \end{equation}

with a trend weighted average \begin{equation} X_{r,i,wAve}^{j,"Data"}=\sum_{expost}X_{r,i,expost}^{j,"Data"}\cdot t_{expost}/\sum_{expost}t_{expost} \end{equation}

How is this rule motivated? If R² for a certain time series is 100%, in other words: for a perfect fit, the restricted trend estimate is fully drawn towards the unrestricted Step 1 estimate. If R² is zero, the trend curve does not explain any of the weighted variance of the series. Consequently, the support is equal to the ‘base data’. The ‘base data’ represent a three-year average around the last three known years. For all cases in between, the supports are the weighted average of the unrestricted trend estimate weighted with R² and the three-year average weighted with (1-R²). Generally, all trend estimates are restricted to the non-negative domain.

The above definition of supports works for series with expost data from CAPREG only as well as for those series with an extended set of observations (expostT, see above). The only difference is whether the three year average denoted above simply with “bas” is calculated using the three last years from set expost or from set expostT (BASM or BAST in ‘captrd\define_stats_and_supports.gms’).

Our objective function for Step 2 will be the sum of squared deviations from the supports defined above, weighted with the variance of the error terms from the first step: \begin{equation} Penalty=\sum_{r,i,j,expost}\left(\frac {X_{r,i,exante}^{j,"Trend"}-X_{r,i,exante}^{j,"Support"}}{\sqrt X_{r,i,verErr}^{j,"Step1"}}\right)^2 \end{equation}

where the weighted variance of errors is \begin{equation} X_{r,i,verErr}^{j,"Step1"}=wSSE_{r,i,j}/\left(\sum_{expost}t_{expost} -1\right) \end{equation}

The variance of the error term is used to normalise the squared deviations from all series which serves two purposes. First the weighted error variance is decreasing with the mean of the explanatory variable. Normalizing with it will hence ensure that the penalty targets relative rather than absolute deviations. Otherwise the solver would only tackle the deviations from “large” crops, say soft wheat, and more or less ignore the deviations of oats, for example. Secondly the deviations from the support are penalized stronger where the Step 1 trend had a high explanatory power and therefore a low variance of the error term.

The constraints in the trend projection enforce mutual compatibility between baseline forecasts for individual series in the light of relations between these series, either based on definitions as ‘production equals yield times area’ or on technical relations between series as the balance between energy deliveries from feed use and energy requirements from the animal herds. The set of constraints is deemed to be exhaustive in the sense as any further restriction would either not add information or require data beyond those available. The underlying data set takes into account all agricultural activities and products according to the definition of the Economic Accounts for Agriculture.

The constraints discussed in the following (from ‘captrd\equations.gms’) can be seen as a minimum set of consistency conditions necessary for a projection of agricultural variables. The full projection tool features further constraints especially relating to price feedbacks on supply and demand.

Constraints relating to market balances and yields

Closed market balances (CAPTRD eq. MBAL_) define the first set of constraints and state that the sum of imports (IMPT) and production (GROF) must be equal to the sum of feed (FEDM) and seed (SEDM) use, human consumption (HCOM), processing (INDM,PRCM,BIOF), losses (LOSM) and exports (EXPT):

\begin{align} \begin{split} X_{r,i,t}^{IMPT,Trend}+X_{r,i,t}^{GROF,Trend}&=X_{r,i,t}^{FEDM,Trend}+X_{r,i,t}^{SEDM,Trend}+X_{r,i,t}^{PRCM,Trend}\\ &+X_{r,i,t}^{INDM,Trend}+X_{r,i,t}^{BIOF,Trend}+X_{r,i,t}^{LOSM,Trend} \\ &+X_{r,i,t}^{HCOM,Trend}+X_{r,i,t}^{EXPT,Trend} \end{split} \end{align}

Where r are the Member States of the EU, i are the products, t the different forecasting years, corresponding to the equation. In the case of secondary products (dairy products, oils and oilcakes, for example) production is given on item MAPR. Domestic use DOMM (sum of the right hand side without exports) and net trade NTRD are defined in separate equations (DOMM_, NTRD_) not reproduced here. They do not act as constraints but permit a link to expert projections for EU markets in Step 3.

Secondly, production of agricultural raw products (GROF) is equal to yield times area/herd size (LEVL) where acts are all production activities (eq. GROF_):

\begin{equation} X_{r,i,t}^{GROF,Trend}= \sum_{acts}X_{r,i,t}^{acts,Trend}X_{r,LEVL,t}^{acts,Trend} \end{equation}

The market balance positions for certain products enter adding up equations for groups of products (cereals, oilseeds, industrial crops, vegetables, fresh fruits, fodder production, meat, eq. MBALGRP_). As an example, total cereal production is equal to the sum over the produced quantities of the individual cereals.

\begin{equation} X_{r,pro\_grp,t}^{MrkBal,Trend}= \sum_{i\in pro\_grp}X_{r,i,t}^{MrkBal,Trend} \end{equation}

Constraints relating to land use and cropping area

Adding up over the individual crop areas defines the total utilizable agricultural area (UAAR,LEVL, AREAB_):

\begin{equation} X_{r,LEVL,t}^{UAAR,Trend}= \sum_{crops}X_{r,LEVL,t}^{crops,Trend} \end{equation}

Adding up over the individual crop areas defines (in GRPLEVL_) the level of groups (set GrpC = {cereals, oilseeds, industrial crops, vegetables, fresh fruits, fodder production on arable land}):

\begin{equation} X_{r,LEVL,t}^{GrpC,Trend}= \sum_{crops \in GrpC}X_{r,LEVL,t}^{crops,Trend} \end{equation}

Adding up over mutually exclusive land use (in LANDUSEB_, for set LandUseARTO, see Annex: Tables 7-9) defines the total area (ARTO,LEVL):

\begin{equation} X_{r,LEVL,t}^{ARTO,Trend}= \sum_{LandUseARTO}X_{r,LEVL,t}^{LandUseARTO,Trend} \end{equation}

Constraints relating to agricultural production

Another Equation (OYANI_) links the different animal activities over young animal markets:

\begin{equation} X_{r,oyani,t}^{GROF,Trend}-X_{r,oyani,t}^{STCM,Trend}= \sum_{iyani\leftrightarrow oynai}X_{r,iyani,t}^{GROF,Trend} \end{equation}

Where oyani stands for the different young animals defined as outputs (young cows, young bulls, young heifers, male/female calves, piglets, lambs and chicken). These outputs are produced by raising processes, and apart from stock changes STCM (defined in Equation SOYANI_, not reproduced here), they are completely used as inputs in the other animal processes (fattening, raising or milk producing).

For those activites that have been split up in the database into a high and low yielding variant (DCOW, BULF, HEIF, GRAS) with 50% for each, this split is maintained (SPLITFIX_)

\begin{equation} X_{r,LEVL,t}^{splitactlo,Trend}= X_{r,LEVL,t}^{splitacthi,Trend} \end{equation}

The purpose of this split has been to permit an endogenous variation of yields also for animal activites, but so far no statistical information on the distribution of intensities has been available. Hence “intensive” has been defined to represent the upper 50% of the total distribution and it makes sense to maintain this split also in the baseline.

Animal herds (HERD) are related to animal activity levels through the process length in days (DAYS) via HERD_.

\begin{equation} X_{r,HERD,t}^{maact,Trend}= X_{r,LEVL,t}^{maact,Trend}\cdot X_{r,DAZS,t}^{maact,Trend} /365 \end{equation}

The process length is fixed to 365 days for female breeding animals (activities DCOL, DCOH, SCOW, SOWS, SHGM, HENS) such that the activity level is equal to the herd size³⁾. For fattening activites the process length, net of any empty days (relevant for seasonal sheep fattening in Ireland, for example) times the daily growth should give the final weight after conversion into live weight with the carcass share carcassSh and consideration of any starting weight startWgt in FinalWgt_

\begin{align} \begin{split} X_{r,yield,t}^{maact,Trend}/carcassSh_{maact}=&startWgt_{maact}+X_{r,DAILY,t}^{maact,Trend}\\ &\cdot (X_{r,DAYS,t}^{maact,Trend}-X_{r,EDAYS,BAS}^{maact,data}) \end{split} \end{align}

As the daily growth is an important input into the livestock sector requirement functions it turned out useful to explicitely link it to the yields in terms of meat, both in the expost data (accounting identities in COCO) and here in the projections. Heavier animals require in this way a higher daily growth and/or a longer fattening period. For all inputs into the requirement functions hard constraints have been imposed (without the possibility to relax them in the solution process) to ensure that projected variables are fully in line with these contraints, mostly over bounds in ‘estimate_MS.gms’, but also through a specific (ad hoc) equation for male adult cattle that permits at most a daily growth of 0.4+500*0.0016 = 1.2 kg per day for a 500 kg final live weight, but more for heavier animals (DAILYUP_).

\begin{equation} X_{r,DAILY,t}^{bulf,Trend}\lt 0.4 + X_{r,meat,t}^{bulf,Trend}/carcassSh_{maact}\cdot 0.0016 \end{equation}

While all information for the requirement functions of CAPRI is projected consistently, they are not active in their detailed form in CAPTRD due to the complexity of the respective calculations. Instead these requirement functions are included in a simplified form as part of the balances for energy and protein requirements (REQS_) for each animal type maact:

\begin{equation} \sum_{feed}X_{r,feed,t}^{maact,Trend}X_{r,feed,t}^{Cont,Trend}= 0.998^t(a_{maact}^{Const}+a_{maact}^{Slope} X_{r,yield,t}^{maact,Trend}) \end{equation}

where Cont are the contents in terms of energy and crude protein. The left hand side of the equation defines total delivery of energy or protein from the current feeding practise per animal activity in region r, whereas the right hand side the need per animal derived from requirement functions depending on the main output (meat, milk, eggs, piglets born). The parameters a and b of the requirement functions are estimated from engineering functions as implemented in the CAPRI modelling system, and scaled so that the balance holds for the base year. The factor in front of the requirements introduces some input saving technical progress of -0.2% per annum.

The feeding coefficients multiplied with the herd sizes define total feed use for the different feeding stuffs ‘bulks’ (cereals, protein rich, energy rich, dairy based, other) and single nontradable feed items (grass, maize silage, fodder root crops, straw, milk for feeding, other fodder from arable land), technically in the same (GROF_) equation as equation below:

\begin{equation} X_{r,feed,t}^{GROF,Trend}=\sum_{maact}X_{r,feed,t}^{maact,Trend}X_{r,levl,t}^{maact,Trend} \end{equation}

Feed use of individual products must add up to the feed use of the ‘bulks’ mentioned above (in FEED_):

\begin{equation} X_{r,feed,t}^{FEDM,Trend}=\sum_{o\rightarrow feed}X_{r,o,t}^{FEDM,Trend} \end{equation}

Additional equations impose that certain stable relationships of agricultural technology are also maintained in projections:

Equation EFED_ ensures that feed use of non-tradable fodder items must be equal to production after accounting for losses.
Other equations (PosLo_, PosUp_) force the relation of seed use or losses to production (plus imports for losses) into a +-20% range around the base year value.
Production has to exceed seed use and losses (SEED_)
The ratio of straw to cereal yields is maintained at base year values (STRA_)
Livestock units per hectare are calculated (LU_) and may thus be subject to constraints (limiting their deviations from the supports, for example).

Finally there is an Equation (LABO_) ensuring that projections of family (LABH) and hired labour (LABN) in agriculture add up to total labour (LABO):

\begin{equation} X_{r,LABO,t}^{GROF,Trend}=X_{r,LABH,t}^{GROF,Trend}X_{r,LABN,t}^{GROF,Trend} \end{equation}

In the first place projections of family and hired labour follow from input coefficients combined with the activity levels, but the previous equation permits to apply bounds to the total.

Constraints relating to prices, production values and revenues

The check of external forecasts revealed that for some products, external price projections are not available. It was decided to include prices, value and revenues per activity in the constrained estimation process. The first Equation (EAAG_) defines the value (EAAG, position from the Economic Accounts for Agriculture) of each product and product group as the product of production (GROF) times the unit value prices (UVAG):

\begin{equation} X_{r,i,t}^{EAAG,Trend}=X_{r,i,t}^{GROF,Trend}X_{r,i,t}^{UVAG,Trend} \end{equation}

The revenues of the activities (TOOU, total output) for each activity and group of activities acts are defined in Equation REVE_ as:

\begin{equation} X_{r,TOOU,t}^{acts,Trend}=\sum_o X_{r,o,t}^{acts,Trend}X_{r,o,t}^{UVAG,Trend} \end{equation}

Consumer prices (UVAD) are equal to producer prices (UVAG) plus a margin (CSSP ) according to Equation UVAD_: ⁴⁾ (fußnote61)

\begin{equation} X_{r,i,t}^{UVAD,Trend}=X_{r,i,t}^{UVAG,Trend}X_{r,i,t}^{CMRG,Trend} \end{equation}

Constraints relating to consumer behaviour

Human consumption (HCOM) is defined as per head consumption multiplied with population (HCOM_):

\begin{equation} X_{r,i,t}^{HCOM,Trend}=X_{r,i,t}^{INHA,Trend}X_{r,LEVL,t}^{INHA,Trend} \end{equation}

Consumer expenditures per caput (EXPE) are equal (via EXPE_) to human consumption per caput (INHA) times consumer prices (UVAD):

\begin{equation} X_{r,i,t}^{EXPE,Trend}=X_{r,i,t}^{INHA,Trend}X_{r,LEVL,t}^{UVAD,Trend} \end{equation}

Total per caput expenditure (EXPE.LEVL) must add up (in Equation EXPETOT_):

\begin{equation} X_{r,LEVL,t}^{EXPE,Trend}=\sum_i X_{r,i,t}^{ESPE,Trend} \end{equation}

Constraints relating to processed products

Marketable production (MAPR) of secondary products (sec) - cakes and oils from oilseeds, molasses and sugar, rice and starch - is linked in Equation MAPR_ to processing of primary products (PRCM) by processing yields (PRCY):

\begin{equation} X_{r,sec,t}^{MAPR,Trend}=\sum_{i \wedge sec \leftarrow i} X_{r,i,t}^{PRCM,Trend}X_{r,sec,t}^{PRCY,Trend} \end{equation}

In case of products from derived milk (mlkseco) – butter, skimmed milk powder, cheese, fresh milk products, cream, concentrated milk, whole milk powder whey powder, and casein – eq. MLKCNT_ requires that fat and protein content (MLKCNT) of the processed raw milk (MILK⁵⁾) be equal to the content of the derived products, after acknowlegding that small quantities of dairy products are themselves transformed to other dairy products (most relevant for processed cheese):

\begin{equation} X_{r,MILK,t}^{PRCM,Trend}X_{r,MILK,t}^{MLKCNT,Trend}=\sum_{mlk\, sec\, o} \left( X_{r,mlk\, sec\, o,t}^{MAPR,Trend} X_{r,mlk\, sec\, o,t}^{PRCM,Trend}\right) X_{r,mlk\, sec\, o,t}^{MLKCNT,Trend} \end{equation}

Marketable production of by-products from the brewery, milling and sugar industry (set RESIMP = { FENI, FPRI}) are derived from corresponding uses of related products (cereals and sugar, Equation MaprByFeed_):

\begin{align} \begin{split} X_{r,resimp,t}^{MAPR,Trend}= & \sum_{o \rightarrow resimp} \left(X_{r,0,t}^{HCOM,Trend} +X_{r,o,t}^{PRCM,Trend} +X_{r,o,t}^{INDM,Trend} +X_{r,o,t}^{BIOF,Trend}\right) \\ & \cdot \frac {X_{r,resimp,t}^{MAPR,bas}}{ \sum_{o \rightarrow resimp} \left( X_{r,o,t}^{HCOM,bas} +X_{r,o,t}^{PRCM,bas}+ X_{r,o,t}^{INDM,bas}+ X_{r,o,t}^{BIOF,bas}\right)} \end{split} \end{align}

Constraints relating to bio-fuel production

¹⁾

The only exceptions are the quota regimes on the milk and sugar markets which are recognised in the trend projections.

²⁾

For technical reasons some years are “obligatory” result years, for example the year immediately following after the last ex post year.

³⁾

The wording for animal numbers is a continuous source of confusion that may also affect older parts of this documentation or table headings from the CAPRI GUI. It is therefore recommendable to reserve the term “herd” strictly to stock variables (animals countable at a particular day) whereas the flow variable “produced heads per year” is the activity level for fattening activities.

⁴⁾

The symbol CSSP (initially for “consumer surplus”) is usually used for the welfare effects related to final consumers (currently expressed as equivalent variation). Consumer margins are stored on CMRG in the market model. This misuse of code CSSP in CAPTRD is due to historical reasons.

⁵⁾

This is somewhat indirectly related to processing of cow milk and sheep & goat milk over MAPR.MILK = PRCM.COMI + PRCM.SHGM with a processing yield PRCY.MILK = 1 and over the market balance for product MILK which ensures that, with minimal trade of raw MILK, most of MAPR.MILK will end up as PRCM.MILK.

Table of Contents