CAPRI Online Manual (update)

This is an old revision of the document!

Forecast tool CAPTRD

The tool providing projections for the European regions (the EU as of 2019, Turkey, Norway, Albania, Northern Macedonia, Montenegro, Bosnia and Herzegovina, Kosovo and Serbia) in CAPRI is called CAPTRD. It operates in several steps:

Step 1 involves independent trends on all series, providing initial forecasts and statistics on the goodness of fit or indirectly on the variability of the series.
Step 2 imposes constraints like identities (e.g. production = area * yield) or technical bounds (like non-negativity or maximum yields) and introduces specific expert information given on the MS level.
Step 3 includes expert information on aggregate EU markets, typically coming from the AgLink model or GLOBIOM. This external data is not available for all individual countries in CAPRI, but for larger regions. Therefore, several countries must be simultaneously estimated in order to ensure proper use of this important prior information.
Step 4 Depending on the aggregation level chosen the MS result may be disaggregated in subsequent steps to the regional level (NUTS2) or even to the level of farm types.

The trends estimated in CAPTRD are subject to consistency restrictions in steps 2 and 3. Hence they are not independent forecasts for each time series and the resulting estimator is hence a system estimator under constraints (e.g. closed area and market balances). Nonetheless, the trends are mechanical in the sense that they respect technological relationships but do not include any information about behavioural functions or policy developments¹⁾.

CAPTRD results are in turn only the first of several steps before a full CAPRI baseline is ready to use. The rest of this chapter focuses on CAPTRD.

Step 1: Independent, weighted nonlinear least squares

Before entering into the details it should be stated that ultimately almost any projection may be reduced to a particular type of trend projections, at least if the exogenous inputs, such as population, prices or household expenditure are also projected (usually by other research teams) as functions of time. In this sense trend projection may provide a firm ground on which to build projections and this is exactly their purpose in our work.

The first ingredient in the estimator is the trend curve itself which is defined as:

\begin{equation} X_{r,i,t}^{j,Trend}=a_{r,i,j}+b_{r,i,j}t^{c_{r,i,j}} \end{equation}

where the parameters a, b and c are to be estimated so that the squared deviation between given and estimated data are minimized. The X stands for the data and represents a five dimensional array, spanning up products i and items j (as feed use or production), regions r, points in time t and different data status such as ‘Trend’ or ‘Observed’. The trend curve itself is a kind of Box-Cox transformation, as parameter c is used as the exponent of the trend. For c equal unity, the resulting curve is a straight line, for c between 0 and 1, the curve is concave from below, i.e. increasing but with decreasing rates, whereas for c > 1, the curve is convex from below, i.e. increasing with increasing rates. In order to prevent differences between time points to increase sharply over the projection period, the parameters c are restricted to be below 1.2.

This form has the advantage of ensuring monotonic developments whereas quadratic trends often gave increasing yields for the first part of the projection period and afterwards a decrease. Another conclusion from the early explorations was that it is useful to define the trend variable \(t_{1984} = 0.1, t_{1985} = 0.2, t_{1986} = 0.3 \) etc., giving a potentially strong nonlinearity in the early years of the database (where the frequency of high changes, possibly due to data weaknesses was high) and a rather low nonlinearity in the projection period.

The ex-post period usually covers the period from 1985 towards the end of the underlying CAPREG output (file res_time_series.gdx) which is typically 4-6 years before the current year. The national level COCO data may have somewhat longer series than the regional CAPREG data. To account for this different availability of ex post data the following sets should be distinguished:

Expost: defined from the length of the series in CAPREG output res_time_series.gdx
Exante: covering any sequence of intermediate result years up to the user specified final year²⁾.
ExanteD: Ex ante years with additional COCO1 data (assigned in ‘captrd\ load_coco1_data.gms’)
ExpostT: Union of Expost and ExanteD = years with data for trend estimation

The estimator minimises the weighted sum of squares of errors using the trend variable as weights:

\begin{equation} wSSE_{r,i,j}=\sum_{expost} \left(X_{r,i,expost}^{j,"Data"}-a_{r,i,j}+b_{r,i,j}t_{expost}^{c_{r,i,j}}\right)^2t_{expost} \end{equation}

The weighting with the trend was introduced in the exploration phase based on the following considerations and experience. First of all, it reflects the fact that statistics from the early years (mid eighties) are often less reliable then those from later years. Secondly, even if they are reliable, older data will tend to contribute less useful information than more recent ones due to ongoing structural change. For this reason we have discarded any years before 1992 for the New MS, for example, but the data from the mid 90ies may nonetheless represent a situation of transition that should count less than the recent past. In technical terms the step 1 estimates are found by a grid search over selected values of parameter c with analytical OLS estimates for parameters a and b (see ‘captrd\estimate_trends.gms’) that have been found identical to those of the econometric package Eviews for a given value of c (holds also for wSSE).

Step 2.1: Consistency constraints in the trend projection tool

Step 2 adds the consistency conditions and thus transforms the naïve independent trends into a system estimator. In almost all cases, the unrestricted trend estimates from the first step would violate one or several of the consistency conditions. We want to find estimates that both fit into the consistency constraints and exploit the information comprised in the ex-post development in a technical feasible way. Consider the identity that defines production as hectares/herd sizes times yield. Running independent trend estimates for barley area, barley yield and barley production will almost certainly produce estimates where production is not equal to yield times area. One solution would be to drop one of the three estimates, say yield, and replace it instead by the division of forecasted production by forecasted area. However, by doing so, we throw away the information incorporated in the development of barley yield over time. Adding relations between time series hence helps us to exploit more information than is contained in single series.

When consolidating simultaneously the different Step1 estimates, we will minimise the squared deviations from values computed in Step 1, in the following called “supports”, while complying with all constraints. A risk is that shaky trends may give a forecast line with an end point far away from ex-post observations. Hence we need safeguards pulling our estimates to a ‘reasonable’ value in such cases.

The confidence interval from the Step 1 trend estimation will not help, as it will be centred around the last projection value and as it will simply be quite large in case of a bad R². However, we may use the idea underlying the usual test statistics for the parameters related to the trend (a,b,c). These statistics test the probability of (a,b,c) being significantly different from zero. It can be shown that these tests are directly related to R² of the regression. If the zero hypotheses would be true, i.e. if the estimated parameters would have a high probability of being zero, we would not use the trend line, but the mean of the series instead.

This reasoning is the basis for the supports derived from the Step 1 estimates in CAPTRD (‘captrd\define_stats_and_supports.gms’), after some modifications. First of all, we used a three-year average based on the last known values as the fallback position and not the mean of the series. Secondly, in typical econometric analysis, test statistics would only be reported for the final estimation layout, some variables would have been dropped from the regression beforehand if certain probability thresholds are undercut. For our applications, we opted for a continuous rule as the choice of threshold values is arbitrary. The smaller the weighted R² the stronger the estimates are drawn towards our H0 – the value is equal to the recent three year average:

¹⁾

The only exceptions are the quota regimes on the milk and sugar markets which are recognised in the trend projections.

²⁾

For technical reasons some years are “obligatory” result years, for example the year immediately following after the last ex post year.

Table of Contents

Forecast tool CAPTRD

Step 1: Independent, weighted nonlinear least squares

Step 2.1: Consistency constraints in the trend projection tool