===== The Complete and Consistent Data Base (COCO) for the national scale =====

The COCO database is built by the application of two modules:

** COCO1 module: **\\ 
Prepare national database for all EU27 Member States the Western Balkan Countries, Turkey and  Norway.\\ \\ It is basically divided into three main parts:\\  
  *  A data import “part” that is not a single “module” but rather a collection activity to prepare a large set of very heterogeneous input files 
  * Including and combining these partly overlapping input data according to some hierarchical overlay criteria, and
  * Calculating complete and consistent time series while remaining close to the raw data.

Data preparation (part 1) and overlay (part 2) form a bridge between raw data and their consolidation to impose completeness and consistency. The overlay part tries to tackle gaps in the data in a quite conventional way: If data in the first best source (say a particular Eurostat table from some domain) are unavailable, look for a second best source and fill the gaps using a conversion factor to take account of potential differences in definitions. To process the amount of data needed in a reasonable time this search to second, third or even fourth best solutions is handled as far as possible in a generic way in the GAMS code of COCO where it is checked whether certain data are given and reasonable. However there are a few special topics that are explained in separate sections.

** COCO2: **\\ 
The finishing step estimates consumer prices, consumption losses, and some supplementary data for the feed sector (by-products used as feedstuffs, animal requirements on the MS level, contents and yields of roughage). Both tasks run simultaneously for all countries and build on intermediate results from the main (COCO1) part of COCO like human consumption and processing quantities. 

==== Overview and data requirements for the national scale ====
An overview on the key data collection, assingments and corrections in main program coco1.gms is given in the following figure.

** Figure 2: Overview on key elements in the consolidation of European data at the Member state level (in coco1.gms) **
{{:fig02.png?nolink|}}

Source: Own illustration

The different steps will be explained in more detail in the following sections.

The CAPRI modelling system is, as far as possible, fed by statistical sources available at European level which are mostly centralised and regularly updated. Farm and market balances, economic indicators, acreages, herd sizes and national input output coefficients were initially almost entirely from EUROSTAT. In the course of time, more and more special data sets have been added to fill gaps or resolve problems detected in EUROSTAT data, such as specific data on Western Balkan Countries or on the biofuel sector.
 
The main sources used to build up the national data base are shown in the following.

** Table 3: Data items and their main sources **
^ Data items ^ Source ^
| Activity levels | Eurostat: Crop production statistics, Land use statistics, herd size statistics, slaughtering statistics, statistics on import and export of live animals\\ \\ For Western Balkan Countries and Turkey: Eurostat supplemented with national statistical yearbooks, data from national ministries, FAOstat production statistics and others |
| Production, farm and market balance positions | Eurostat: Farm and market balance statistics, crop production statistics, slaughtering statistics, statistics on import and export of live animals\\ \\ For Western Balkan Countries and Turkey: Eurostat supplemented with national statistical yearbooks, data from national ministries,  FAOstat production statistics and others |
| Sectoral revenues, costs, and producer prices | Eurostat: Economic Accounts for Agriculture (EAA) and price indices for gap filling, otherwise unit value calculation\\ \\ For Western Balkan Countries and Turkey: Supplemented with national statistical yearbooks, data from national ministries, results from AgriPolicy, FAOstat price statistics |
| Consumer prices | Derived from macroeconomic expenditure data (Eurostat, supplemented with UNSTATS) and food price information from various sources |
| Output coefficients | Derived from production and activity levels, engineering knowledge |

==== Data Import ====

A large set of very heterogeneous input files (in terms of organisation and format) is collected, currently covering the following years:

** Table 4: Temporal coverage of national data by region **
^ Member State ^ Range ^
| EU15 Member States without Germany | 1984 – 2014 |
| Germany and (12) New Member States | 1989 – 2014 |
| Western Balkan (WB) Countries and Turkey | 1995 – 2014 |
| Norway | 1984 – 2014 |

=== Eurostat data ===

** First step: Data download and format conversion **
Data are originally downloaded in “TSV-format”, as offered by Eurostat for bulk data users. The TSV-format is a flat file format for time series. Data can be selected for all EU MS and some Candidate Countries. Availability differs by country, of course (almost nothing for the Kosovo, Montenegro, Bosnia & Herzegonina). In the process of downloading the TSV files are also converted in GAMS readable form (csv or gdx). The following themes and table groups of Eurostat are accessed:

Agriculture, forestry and fisheries

  * Agriculture (“agr”)
    * Economic Accounts for Agriculture (Table Group “aact”, saved on CAPRI parameter “p_ecoact”
    * Agricultural prices and price indices (Table Group “apri”, saved on CAPRI parameter “p_agripri” 
    * Agricultural product related physical information (production, activity levels from Table Group “apro”, saved on CAPRI parameter “p_agriprod” 
    * Older, discontinued Eurostat series that still provide useful information (requiring some ad hoc extrapolations), for example (a) market balance information for products other than cereals, oilseeds and wine, critical for “COCO1”, (b) relative price level indices of food products (MS relative to EU average) for COCO2, (c) availability and production of feedingsstuffs (useful for COCO2 completions on feed from by-products)

Economy and Finance

  * National annual accounts (“nama10”)	
    * Annual national accounts -> National Accounts detailed breakdowns (by industry, by product, by consumption purpose) -> Final consumption expenditure of households by consumption purpose (COICOP 3 digit), 
    * General indicators to National Accounts - Population and employment 
    * GDP and main components - Current prices, volumes, price indices

  * Prices (“prc”)
    * Harmonized indices of consumer prices (prc_hicp) here: HICP (2005=100) -annual Data, and HICP - Item weights 

** Second step: data selection and code mapping **
The second step is data selection and code mapping performed by the GAMS program //‘coco_input.gms’.// Cross sets linking Eurostat codes to COCO codes define the subset of data series subsequently used.

The mapping rules are collected in two sub-programs called by //‘coco_input.gms’,// for example:

  * //‘gams\coco\ eurostat_agriculture_mapping.gms’// for the tables from Eurostat’s “Agriculture and Fisheries” Statistics
  * //‘eurostat_ econfinc_mapping.gms’// for the tables from Eurostat’s “Economy and Finance” Statistics

Example from file //‘Eurostat _agriculture_mapping.gms’//

| SET AgriProdOriEurostat / \\ apro_acs_a_C1000_AR "CEREALS-EXCLUDING RICE-AREA"\\ apro_acs_a_C1110_AR "COMMON WHEAT AND SPELT - AREA" \\ \\ SET AgriProd_MAP(ASS_COLS,ASS_ROWS,AgriProdOriEurostat) / \\ CERE.LEVL. apro_acs_a_C1000_AR \\ SWHE.LEVL. apro_acs_a_C1110_AR |

The results of the program run are gdx-files loaded by files (e.g. coco\coco1_eurostat.gms) which are in turn loaded by coco1.gms or coco2.gms.

=== Western Balkan Countries and Turkey ===

For those countries Eurostat data need completion in almost every area which is handled in country specific xls files. The structure of these supplementary Excel country sheets and the definitions of the data are tailored to COCO. The resulting sheets in these xls files are uniform across countries, in order to ease data extraction for the modelling part by applying macros. However, each national information system has its own peculiarities and hence, not all data are fully harmonised across countries. Various sources are assessed and combined in a case by case manner: Eurostat data, if already available and plausible, are handled as the preferred data source. Data collected from the national statistical yearbooks have second priority, followed by expert data collected in from earlier projects. Finally FAO data provides often the fall-back solution for any remaining missing time series.

The final sheet in each of these country specific xls files is the interface to the GAMS programing world of COCO. An Excel macro “SELECT_data_all” collects the time-series compiled in other sheets and puts them into this final sheet with the appropriate COCO code. Another macro finally exports the numbers into text files like “dat\coco\bosnia_coco.gms”. Because the xls file are quite complex due to various linkages, we do not read directly from them. This avoids unplanned changes and permits convenient tracing of data changes via the CAPRI versioning system svn. 

=== Supplementary data for Romania and Bulgaria ===

Country level data from national experts were compiled in Excel files that help in particular to complete the meat and milk sectors.

=== FAO data selection ===

Two FAO data sources are combined:

  * For all regions FAO data (mapped in the context of module “global database” to CAPRI codes and hence consistent across modules) serve as a fall back option under certain conditions, defined in the code. This fall back function of FAO data has gained in importance since Eurostat discontinued the publication of most market balances since 2014. In some cases also activity level (area) information may be taken from FAO.
  * Some particular data like disaggregate data on herds of chicken, ducks, turkeys and geese are compiled in a separate include file dat\coco\fao_add.gms because these data types are usually not loaded for global database.

=== Other additional input data ===

COCO1: Biofuels

  * Production, market balance and feedstock quantities for biodiesel and bioethanol are collected from a multitude of sources:
     * EU project www.elobio.eu (production, demand, biodiese and bioethanol, 1999-2007)
     * Eurostat, Energy balances and demand (tables nrg_xxxx) production, demand, trade for diesel, gasoline, biodiesel and bioethanol, 2001-15) 
    * Eurostat, Production and trade (PRODCOM), ethanol and biodiesel, 2000-14  
    * PRIMES model((PRIMES MODEL, EC3MLAB of ICCS, National University of Athens.)) database (production, biodiesel and bioethanol, 2000-07)
    * US Energy Information Administration (EIA), production of biodiesel and bioethanol, 2000-12, incl. some non-EU countries
    * DG Agri Ethanol balances (production partly with split by feedstocks and MS, demand and trade)
    * Aglink ex post database (most data for Turkey, also EU biofuel production from non-standard sources (NAGR).
    * USDA GAIN reports (market balances for Serbia, feedstocks for biodiesel in EU) 
    * FAOstat (market balances for palm oil) 
  * Prices at the pump and retail prices for diesel and gasoline are from Eurostat’s energy database (http://epp.eurostat.ec.europa.eu/portal/page/portal/energy/data/database), supplemented with IEA Statistics 2016 for Turkey.
  * Taxes for diesel, gasoline, biodiesel and bioethanol are collected from DG Energy website and publications, and EURACTIV, EU news & policy debates, Brussels  (http://www.euractiv.com/en/enterprise-jobs/fuel-taxation/article-117495)
  * Some supplementary Aglink data give information on feedstock composition, tariffs and world market prices for crude oil, biodiesel and bioethanol.
  * Trade data for undenatured ethyl alcohol, denatured ethyl alcohol, fatty acid mono-alkyl esters, crude palm oil, palm and fraction and palm kernel and fraction are collected from Eurostat’s COMEXT data (2000-14).
  * Market balances for palm oil are taken from FAOstat and supplemented with COMEXT.

COCO1: Sugar Quotas

  * All sugar quotas 1999 until 2006 from the annual sugar yearbook.
  * Buy-back 2006 in the restructuring program from CAP monitor 16 January 2008.
  * Sugar quotas renounced by member states following sugar reform (2006-2010), information from Wirtschaftliche Vereinigung Zucker e.V. (WVZ) and Verein der Zuckerindustrie e.V. (VdZ), Bonn  (http://www.zuckerwirtschaft.de/1_3_2_1.htm) and KWS SAAT AG, Einbeck (http://www.kws.de/ca/fh/thd/)

COCO1: Milk

  * Market balances for casein and whey powder were only available on EU level from ZMP, Bonn, which was closed down in 2009.
  * DG Agri partly completes gaps in Eurostat series and offers this consolidated database for download. This is used to close gaps in gams\coco\coco1_eurostat.

COCO1: Producer prices for cotton

Import unit values for cotton seeds, cotton lint, flax and hemp are additionally selected from COMEXT.

COCO1: Expert data

Data from experts, which will overwrite all Eurostat data, is included for special issues for some Member States (e.g. grass yields for the Netherlands).

This also applies at the moment for all Norwegian input data such that Eurostat data are currently ignored. However, as Eurostat completeness has also improved on Norway, this procedure might be reconsidered in the future. 

COCO1: Land use data

The raw data on land use are currently prepared outside the CAPRI system. Source code and input files are available at EuroCARE, Bonn (R:\Coco_input\land_use). Relevant (raw) information is stored in dat\coco\landuse_data_and_sets.gdx. The data base comprises information on land use classes from various sources, which are again partly discontinued but useful for the early years:

  * REGIO - Eurostat, land use, REGIO domain( NUTS2 level - yearly, 1984-2014)
  * ENVIO - Eurostat, land use, env_la_luc1.xls (MS level - 1985, 1990,1995, 2000)
  * LANDCOVER - Eurostat, land cover(MS level – 2009, 2012, 2015)
  * Corine Land Cover (CLC), 44clc_nuts2.xls (NUTS2 level - 1990, 2000, 2006, 2012)
  * FAO -  area.xls(MS level - yearly, 1984-2016)
  * MCPFE (Ministerial Conference on the Protection of Forests in Europe), jointly published by FAO and UNECE (MS level - 1990, 2000, 2005, 2010, 2015)
  * FSS - Eurostat, FSS(NUTS2 level - 1990, 1993, ..., 2007, 2010, 2013), only added in coco1\landuse
  * UNFCCC (1990-2016), also covers land transitions and settlement data. Official data for LULUCF accounting, merged with other data in coco1_landuse. 

COCO2: Economic data

  * Eurostat: Economy and Finance, Exchange rates, Bilateral exchange rates, Euro/ECU exchange rates. Data is already prepared in Excel for premature introduction of Euro in price data from the International Labour Organisation (ILO). 
  * Eurostat, population. To complete early years data from and old Eurostat domain (AGRIS, Population) are also loaded. 
  * GDP price index expressed in Euros

COCO2: Expenditures

Consumer expenditures on food items are included from:
  * Eurostat: Old domain SEC2 for data up to 1997 (HIST)
  * Instituto Nacional de Estadística m(INE): Anuario de Estadística Agroalimentaria (AEA), Consumer expenditure on food items in Spain close to HIST definitions up to 1996
  * Rheinisch-Westfälisches Institut für Wirtschaftsforschung (RWI): Consumer expenditure on food items for DEW 1985-92 in Mio DM
  * Statistisches Bundesamt Deutschland (SBA): Weighted average of expenditure shares in German household types 2 and 3 (1985-91)
  * Eurostat, Final consumption expenditure of households by consumption purpose (COICOP 3 digit) 
  * United Nations Statistics Division (UNSTATS): Household consumption expenditure in USD
  * Eurostat, PRICE: Consumer expenditure weights are used as indicators for budget shares
  * Eurostat: Economy and Finance, GDP and main components, Final consumption expenditure of households: Total private consumption of households in current prices (Table “a_gdp_c”)

COCO2: Consumer food prices and consumer food price indices

Food price indices from:

  * Eurostat, PRICE, 2005=100.
  * Several national sources for western Balkan regions
  * Eurostat: Old domain FOOD of section AGRICULTURE: Aggregate food price index with old Eurostat methodology and base 1985
  * INTERNATIONAL LABOUR ORGANIZATION Geneva (ILO): LABORSTA Labour Statistics Database, retail prices of selected food unit, prices indices of selected food unit, discontinued after 2008
  * Eurostat: Detailed average prices – 2008 - 2015 [table prc_dap15] is used to extend the ILO consumer price series.

COCO2: By-products

  * FAO: Food Balance Sheets, Commodity Balances, Livestock and Fish Primary Equivalents: Imports and exports quantities for fish meal, dried cassava, gluten deed and meal, as well as feed quantities for fish meal.
  * Eurostat: Purchase prices for fish meal, dried sugar beet pulp, soya cake,  and wheat bran 
  * Eurostat: data (at most up to 2010) from discontinued tables (“food_in afeed1” and “bilares”) on production of feedingstuffs and availability of feedingstuffs 
  * FAO: Food Balance Sheets, Commodity Balances, Crop Primary Equivalents: Milled rice and total sugar unit value 
  * Netherlands Economic Institute (NEI): Purchase prices for sugar, calculated by the average of Intervention Price and CAOBISCO price

COCO2: Milk Products

  * Zentrale Markt- und Preisberichtstelle (ZMP): Producer prices of selected milk products (only available for some countries)
  * Agrarmarkt Informations-Gesellschaft mbH (AMI): AMI-Marktbilanz Milch 2011 (only available for some countries)
  * DG AGRI (Réponses au questionnaire (art. 8 du Règlement (CEE) n° 536/93), (art. 15 R 1392/2001) and (art. 26 R 595/2004)): Data on direct sales of raw milk and farm processing in DG AGRI definitions for quota administration 

COCO2: Others

  * Eurostat: External trade, External trade detailed data, COMEXT, EU27 Trade Since 1988 By CN8, Reporter EU15: Auxiliary trade data for wheat, soft wheat and durum wheat, export values and quantities for cotton and cotton seeds, data on imports and exports of most relevant by-products  
  * Statistisches Jahrbuch ueber Ern., Landw. U. Forsten, 1999, 2006 und 2010 (Aufkommen u Verbrauch von Futtermitteln): Net imports and feed from domestic production of by-products in Germany
  * USDA: Prices for soya, rape and sunflower cake and oil, prices for corn gluten feed

==== COCO1: Overlay from various sources ====

The main program coco1.gms starts with a number of declarations of sets and parameters to handle the collection and overlay of “raw data”, often given in a classification different from the target one (sets COLS, ROWS).
{{:SS01.png?nolink|}}
A recurrent characteristic of COCO is to solve the problem: if the first best source has gaps in a particular country, or even is entirely empty, select the second or even third best source to fill the gaps. 

=== Including standard and supplementary data from Eurostat (‘coco1_eurostat.gms’) ===

The main program coco1.gms proceeds by importing data from Eurostat prepared beforehand (in coco_input.gms). The main data (on p_agriProd, p_ecoAct, and p_agriPri) are processed step by step and corrections made on selected data for all MS((Eurostat offers data for Belgium and Luxembourg separately, whereas the database combines both countries to the model region "BL000" (Belgium and Luxembourg). The key reason is that Eurostat offers data mainly for the aggregate Belgium and Luxembourg up to the year 1999, especially for all market balances. Furthermore, Luxembourg has a rather small agricultural sector (2004 total output was about EUR 250 million) with some similarities to Belgium.)). 
{{:SS02.png?nolink|}}

=== Data from FAOstat (‘coco1_fao.gms’) ===

The general fall-back option for missing data is FAOstat which requires a few corrections compared to the standard mappings in the context of module “global database”, including:

  * Rebooking of “other use” to processing (PRCM) or other balance positions
  * Disaggregation of olives (table olives, olives for oil), grapes (table grapes, grapes for wine), wheat (common, durum)
  * Checks for data changes after sugar reform 2006
  * Country specific fixes like in coco1_eurostat.gms.
{{:SS03.png?nolink|}}

===Data from additional sources for the Western Balkan Countries and Turkey (‘coco1_croatia_data.gms’ and ‘coco1_candi_AgriProd_AgriPri.gms’)===

Croatia is the first country singled out from the special data input for the Western Balkan Countries and Turkey. Croatia is by now mostly sourced from Eurostat, as the other EU members, but a few supplementary expert data have been retained. For the other Western Balkan regions and Turkey, //‘coco1_candi_AgriProd_AgriPri.gms'// further adapts the WB data from the country specific xls files to match the COCO definitions that also apply to EU28 countries (on parameters p_agriProd and p_agriPri).
{{:SS04.png?nolink|}}

The include file handles the following:

  - Similar to EU-28 MS there are many case-by-case adjustments correcting different scaling and definitions (live weight <-> carcass weight, reaggregations for wine and fruits…).
  - In many cases, market balances are simply incomplete. As a fall back solution, domestic demand is calculated from production and net trade and disaggregated with shares taken from a sister country aggregate (Romania, Bulgaria, Greece, Slovenia, Hungary). Other corrections with “borrowed” information are:
    - Trade data are frequently missing in the WBs, such that FAO data are included where available. 
    - Production of oilcakes and sugar is estimated from raw products, if missing, using the sister country aggregate processing coefficients; 
    - The production of milk products is estimated from processing coefficients in Serbia which has a quite complete series;
  - Price information is also completed relying on the sister country aggregates. 

=== Final completions and revisions for all Member States (‘coco1_finish_agriprod.gms’) ===

Based on the availability of second and third best options various finalising steps are applied to the quantity data. It should be noted that the CAPRI database tries to estimate market balances (needed for separate behavioural function for feed, food, processing, biofuel demand) in spite of Eurostat discontinuing the publication of market balances for most products since 2014. For this purpose the old Eurostat market balances are still loaded and combined with more recent production data. This triggers the need for data completions and estimations in the most recent years (which are also most critical for projections). In 2019 market balance data have returned to the Eurostat server for cereals and oilseeds, but only for a single year (2017) => It is likely that adjustments like the following will also be needed in the future:

  * Completion of production data from the (discontinued) Eurostat market balance statistics (model code "USAP") with quantity information given from the production statistics (code “GROF”) or from agricultural account statistics (model code "EAAQ") using a correction factor calculated from overlapping years. 
  * Additional gap filling using FAO data for special cases and general cases of missing data (e.g. for balances). An additional difficulty is that FAO commoditiy balances are currently (2019) also ending in 2013 (especially valuable for recent years). 
  * Domestic use can be calculated (under some conditions) from imports, export and usable production. If only domestic use is given for some products, the sub-positions, such as industrial use, processing, human consumption, feed on market, total seed and total losses are allocated with the average shares in data for other years, from the same country. As a fall back solution, the average shares from other countries are used. 
  * For the milk products whey powder and casein, the disaggregation of demand is mainly based on EU data collected by the German "Zentrale Markt- und Preisberichtstelle für Erzeugnisse der Land-, Forst- und Ernährungswirtschaft GmbH" (ZMP) and some auxiliary assumptions. 
  * As data for oilseeds are critical for all countries, the implied processing coefficient is checked for plausibility. If the national coefficient is lower than 60% or above 150% the average coefficient for all EU-15 MS, the data for usable production of the country are corrected by multiplying the processing data with the average EU-15 coefficient. Domestic use and all sub-positions are subsequently re-calculated.
  * Some additional calculations to prepare the use of animal herd data in coco1_anim:
    * Some calculations to combine FAO and FSS data on poultry herds
    * Completions acknowledging seasonality in cattle and sheep and goats herd countings
    * Aggregations and residual calculations to the COCO animal categories from animal types in Eurostat (say “Heifers for raising, 1-2 years”)

The file handling the previous actions is ‘coco1_finish_agriprod.gms’: 
{{:SS05.png?nolink|}}

The previous code snippet also shows for the interested reader two frequently used debugging devices:

  - The key parameters at a certain point in the program flow (above: p_agriProd, p_agriPri, p_ecoAct) are copied to a debugging parameter “debug” (better name would be: “p_debug”). At the end of a coco1 run (or if desired also at this point) the parameter is unloaded into a file “results\coco\debug\debug_%MS%.gdx” such that the various assignments, corrections, deletions that have occurred up to a certain program line may be inspected in one file.
  - The next command “$batinclude “util\debug” %system.fn% %system.incline%  unloads the whole memory, incuding all parameters but also sets and other symbols, at this point into a debugging file in the gams\temp folder. This may be useful to analyse “difficult” cases of debugging.
  - 
Finally the biofuel sector is prepared.

=== EU biofuel sector data (‘coco1_finish_agriprod.gms’ and ‘prepare_biofuel_data.gms’) ===

The first issue to note is that market balances for sugar beet and sugar are compiled in such a way that all biofuel use of beets is converted into biofuel use of sugar, as if the beets were first processed to sugar and only then converted to ethanol. The advantage of this approach is that sugar is part of the market model and thus may enter the behavioural functions for biofuel feedstock use whereas beets only exist in the supply part of CAPRI. A second advantage is that biofuel feedstock use was indeed booked under sugar in some MS and under beets in others such that our approach ensures a standardisation of booking principles.

//Biofuel production//

There is no differentiation made between fuel- or non-fuel (undenatured or denatured) quantities in production, import and export positions of ethanol. But the consumption position of ethanol is differentiated in fuel-ethanol consumption and non-fuel-ethanol consumption. Hence data on fuel and non-fuel production and consumption of ethanol was required. In the case of biodiesel this differentiation is irrelevant. The ex-post data on biofuel production are coming from diverse sources which is unavoidable to complete the data for years as of 2002 up to the present, if necessary with the help of second and third best solutions or assumptions (compare //biofuel\prepare_biofuel_data.gms//).

The overlay considers data availability and consistency across sources: 
  * For ethanol we consider DG agri as the first best source as it does not only cover production and demand, but also a break down by feedstocks (cereals, beets, wine, fruits, potatoes, other).  
  * Some countries (Croatia, Turkey, Bulgaria, Romania, Serbia) are supplemented from other sources (AGLINK-COSIMO, USDA, Eurostat PRODCOM). AGLINK also supplements production other than from agricultural feedstocks. 
  * Eurostat PRODCOM, Energy balances and PRIMES serve to extrapolate or backcast the DG Agri information to years with missing data. 
  * Ethanol trade by MS is taken from COMEXT but scaled to be in line with DG AGri data for the whole EU.
  * Production of biodiesel is usually from the energy balances while trade is from COMEXT. If data are complete and results reliable, demand is computed residually. In cases of missing data or implausible results, demand is taken from Energy balances, PRIMES, or the EloBio project and trade is calculated as a residual with some rules. 

//Feedstock demand//

In addition to market balances for the fuels the CAPRI data base requires the shares of the raw products on the production of biodiesel and bioethanol at the level of CAPRI products. For bioethanol, this information is partly provided by the DG Agri balances, hence this has been selected to be the major source. The detailed recording follows from the existence of support measures for distillation of wine, fruits and potatoes which triggered a detailed monitoring of ethanol markets. However, for biodiesel the statistical sources are scarce. It turns out that the most consistent estimates for EU regions are apparently produced by USDA services, covering rape, sunflower, soya, palm oil but also used cooking oils, tallow and other oils. As these data do not cover single MS an estimation procedure has been devised (in //biofuel\calc_feedstock_shares.gms//). The initialisation of this estimated feedstock composition relied on the observed increase in INDM according to Eurostat (or more precisely the COCO initialisation when entering //‘prepare_biofuel_data.gms’//) which is assumed to be the main source to “cut out” the required biofuel processing quantities (BIOF) by MS from market balances that so far did not include BIOF.

A special case was palm oil, as the CAPRI database (COCO) doesn’t cover an industrial use position for this product so far. EUROSTAT-COMEXT delivers data on import and export quantities of crude palm oil (HS 151110) for EU Member states. Thereby an increase of palm oil imports was observed within the relevant ex post period (2002-2005). Thus the following assumptions were made to derive approximated values for palm oil processing to biodiesel: (a) Import quantities minus export quantities are equal to domestic consumption of palm oil as domestic production in European Member states can be neglected. (b) The average aggregated consumption quantity of palm oil before 2002 was assumed to be completely used for human consumption as no significant biodiesel consumption took place. By subtracting this constant share of human consumption from the observed consumption quantities after 2002 gave an estimate for the quantities used for industrial processing

Given that many data sources are combined and several aggregation conditions should be maintained, it turned out necessary to set up a small optimisation problem with the following properties (see towards the end of //‘prepare_biofuel_data.gms’//):

  * The estimation tries to stay close to the initial feedstock composition 
  * Extra terms penalise deviations from DG Agri (first best souce for ethanol) and implausibly high shares for palm oil 
  * Technical conversion coefficients (see below) link standard feedstock use and estimated production which has to aggregate with non-standard feedstocks (NAGR) to total production of biofuels. Non-standard feedstocks are those not endogenous in the CAPRI market model (potatoes, fruits and other for bioethanol, used cooking oils, tallow and other for biodiesel)  
  * Total domestic use (with data modifications heavily penalised in the objective) is consistently broken down into biofuel use, other industrial use and non-industrial (e.g. food) use to avoid disturbing the initialisation in previous include files based on Eurostat data.

//Technology parameters//

Conversion coefficients for 1st generation biofuels were collected from different sources. The AgLink-Cosimo model includes a set of conversion coefficients which are in line with the CAPRI product definitions and have become the main source for CAPRI. The table below displays the set of conversion coefficients used for 1st generation biofuels and corresponding by-products. 

** Table 5:Conversion coefficients for 1st generation biofuel production **

^ Conversion coefficients (t/t) ^^Ethanol ^Byproducts ^
|**Grains**|Wheat| 0.274| 0.266 DDGS|
|:::|Barley |0.247 |0.266 DDGS|
|:::|Oats |0.247 |0.266 DDGS|
|:::|Rye |0.247 |0.266 DDGS|
|:::|Corn (dry milling) |0.335 |0.292 DDGS|
|**Other** |Table Wine| 0.100| |
|**Sugar Crops**|Sugar |0.517| |
|:::|Sugar beets |0.079 |0.004 Vinasses*|
^ ^^Biodiesel ^Byproducts^
|**Veegetable oils** |Rape oil| 0.922 |0.100 Glycerine|
|:::|Soy oil |0.922 |0.100 Glycerine|
|:::|Sunflower oil |0.922 |0.100 Glycerine|
|:::|Palm oil |0.922 |0.100 Glycerine| \\*considered as molasses (1t vinasses = 0.1t molasses equivalent) depending on the reduced sugar content \\ Source: Own compilation base on AgLink database, PRIMES questionnaire and Szulczyk, K /2007)

Note: The beet coefficient has been increased in the meantime from 0.079 to 0.086.

//Fuel prices and taxes//

For a specification of processing-, biofuel supply- and demand-functions in the base year, ex post prices are required. Furthermore, given the structure of the CAPRI market module (described in Section [[Market module for agricultural outputs]] ), a differentiation of producer, consumer and import price is also needed. These differentiated prices are not covered in any statistical database for biofuels but they can be derived indirectly by given information on taxes, tariffs and subsidies from the world market price which is available. Thus beside ex post prices information on consumer (excise) taxes, import tariffs and further subsidies are required. The AgLink-Cosimo database includes ex post world market prices for ethanol and biodiesel. This price was taken as the base value to calculate the differentiated prices in the respective countries. The import tariffs for ethanol and biodiesel were also taken from the AgLink-Cosimo database. As the consumer taxes for ethanol and biodiesel in most instances correspond to a reduced excise tax on fossil fuels the consumer taxes for gasoline and diesel were taken as a base value. This tax information was acquired from EurActiv((FIXME [[http://www.euractiv.com/en/taxation/fuel-taxation/article-117495]], 20.07.2009.)) where levels of diesel and petrol taxation in 2002 are published for European Member states. For the required time period (2002-2005) taxation levels were calculated with respect to COM(2002)410((Proposal for a Council Directive amending Directive 92/81/EEC and Directive 92/82/EEC to introduce special tax arrangements for diesel fuel used for commercial purposes and to align the excise duties on petrol and diesel fuel (COM(2002)410).)) which set minimum excise tax rates for non-commercial diesel and petrol since 2006. To identify the excise tax exemptions and producer subsidies, if existent, for the single Member states the obligatory ‘Member States reports on the implementation of Directive 2003/30/EC of 8 May 2003 on the promotion of the use of biofuels or other renewable fuels for transport’ were consulted which are published by the Commission(([[http://ec.europa.eu/energy/renewables/biofuels/ms_reports_dir_2003_30_en.htm]].)). Three different types of tax regulations for biofuels were identified which are applied among the different Member states: an absolute tax for biofuels, an absolute reduction of the excise tax on fossil fuels and a relative reduction of the excise tax on fossil fuels. All differentiated in taxation for blended biofuels or pure biofuels. Based on this information the different ex post prices for the period 2002-2005 were recalculated. As the envisaged biofuel demand function will be a function of (among other variables) the relation between fossil fuel consumer prices and biofuel consumer prices the acquisition of fossil fuel prices was required additionally. To hold consistency between the biofuel and fossil fuel prices the price information for fossil fuels were also taken from the AgLink-Cosimo database which provides EU market prices for diesel and petrol. For the recalculation of consumer prices in individual Member states the already collected taxation levels for fossil fuels were applied. Because there exists a significant difference between the physical energy content and the density of biodiesel, ethanol, petrol and diesel a direct comparison of prices (in €/t) is not possible. For this reason the prices as well as the taxation levels were converted into Euro per ton oil equivalent (toe).

===Assigning data to database array===

So far data processing has focussed on the key Eurostat Table Group “apro” (collected on parameter p_agriProd). The next parts of COCO will collect data from other sources, including the other two Table Groups for prices and Economic accounts (“apri”, “aact”) to a single GAMS array "data". This data collection activity happens in files coco1_expert.gms to coco1_eaa.gms with a summary of the details given below.

{{::code_p33.png?600|}}

**Include file ‘//coco1_expert.gms//’**

This file collects expert data for specific countries that receive priority over all other data sources in the initialisaiton.  The most relevant case is Norway where nearly all data are provided and checked by NIBIO (Norwegian Institute of Bioeconomy Research).

**Include file ‘//coco1_crops.gms//’**

This sub-module assigns the areas, crop production data and most market balance positions from Eurostat’s  Table Group “apro” . However, it is necessary to first deal with a double counting in the land use statistics of Eurostat with cotton both counted among textile crops as well as oil crops. This is fixed by having the aggregate activity “textile crops” producing both other oilseeds (i.e. cotton seeds) as well as textiles (here cotton lint) and removing cotton from the other oils area. 

After this special case the crop areas from Eurostat's production statistics are copied to the LEVL position of the "data" array. Data from Eurostat's land use statistics are the second best choice in case of missing areas.
 
Inappropriate aggregation (ignoring gaps in the component series) has been frequently observed in past experiences with Eurostat data such that aggregates are added up, if possible, from any given sub-components. This principle applies to "GRAS" (permanent grass land = meadows PMEA+ pastures PPAS), and some other aggegates. 

In terms of gross production (GROF) it has to be mentioned that preference is given to the market balance information “USAP” over the production statistics “GROF”), as the former may be expected to be consistent with the trade and demand positions. Thus we set (considering the time lag between balance data and production statistics):

For products with market balance	\(DATA(GROF,t) = p\_agriProd(USAP,t+1)\) \\
Remaining products			\(DATA(GROF,t) = p\_agriProd(GROF,t)\)

Some special assingments handle SEDF and LOSF for cereals and the residual calculation of production of "OOIL" starting from oil crops (OILC). 

More important is a procedure to ensure a complete initialisation of fodder production quantities, an area with widespread gaps in the raw data. This procedure estimates fodder yields (of "PMEA", "PPAS", "TGRA", "FCLV", "FLUC", "FPGO", "FAGO" and "MAIF") from the relationship of known fodder yields to those in other EU countries. To ensure completeness, cereal yields are also considered such that fodder yields may be estimated, in the worst case, from the fodder yields in other EU countries, corrected by the ratio of cereal yields in the MS under consideration to EU cereal yields. 

Contrary to the program name, //all// balance positions for crops //and animals//, except milk positions, are assigned to the "data" array in ‘//coco1_crops.gms//’. Specific treatments are necessary for fruits, table grapes and olives for oil and residual calculations anre undertaken for missing human consumption, total domestic use, and usable production.

In several cases upper or lower limits are assigned for qunatities and areas where it turned out that missing data are often completed in the optimisation part of COCO in an unsatisfactory way. The empirical basis for these limits is diverse. It may rest on production statistics (if production is given there but missing in the market balances), on sugar quotas for the sugar beet sector, or in some cases (fruits, vegetables) on a moving average over given observations. 

**Include file ‘//coco1_milk.gms//’**

This file assigns the data for dairy products and raw milk from Eurostat's “apro” Table Group, with some re-aggregations and additional lower and upper limits for the optimisation parts of COCO1.

Gross production of raw milk is usually given from the farm balance data (COMI = CMLK, cow milk + BMLK, buffalo milk. SGMI = EMLK, ewes milk + GMLK goats milk).

Gaps are more frequent for deliveries to dairies ("PRCM") which are preferably derived from the aggregate processing volume of raw milk according to farm balance (to ensure consistency with gross production) or, as a second best solution added up from the components in the dairy collection data (e.g. collection of CMLK, BMLK, EMLK, GMLK). Often there are also data to disaggregate the non-delivered parts of raw milk into direct sales (e.g. HCOM.COMI), feed use (INTF.COMI), use for farm cheese, butter and other processing products (INDM.COMI) and finally losses and home consumption of liquid milk (LOSM.COMI) and to identify on farm production (e.g. FARM.CHES).
 
Whereas production data and deliveries to dairies may be distinguished into “COMI” and “SGMI”, the dairy statistics on derived products obtained or associated market balances do not permit such distinction. As a consequence, the dairy sector is treated as if all raw milk from cows, sheep etc. was collected and merged into single raw milk at dairy ("MILK"). The marketable production for this aggregate milk, at the dairy level, is set to the sum of the processing volumes from cow and buffalo milk, sheep and goat milk (from the farm balance). Finally, the balance sheets for the secondary milk products are usually taken from the “apro” data selected from Eurostat.
 
The content of milk products is initialised using two types of information: statistical data on fat content of dairy products (and protein content for raw milk) and default technical coefficients for the content of milk products, in terms of milk fat and protein (this is the only initial information for protein, apart from raw milk, where statistical data on protein content are available). The initial information on the fat content of dairy products is rendered complete and reliable by discarding statistical information on contents that are implausibly far away from standard technical coefficients.

**Include file ‘//coco1_anim.gms//’**

Assigning herd size, process length, activity level, yield and production data often requires significant reaggregations from the slaughtering statistics and therefore explanations in this documentation: 

The first best source for tons of slaughtered meat of the main animal categories (SLGT.IPIG, ILAM, ICAT and ICHI) is the usable production (USAP) from the balance sheets because this is likely to be consistent with market balances. As a second best source we use the slaughtering statistics, but with a correction factor. Export and imports of live animals expressed in carcass weight are partly taken from the slaughtering statistics or from the balance sheets, depending on availability. It is useful to remember that total production of meats in heads (e.g. "GROF.IPIG") is set equal to the sum of all slaughtered heads plus exported heads minus imported heads. Accordingly, the production of meat in tons equals the sum of slaughtered tons plus exported tons minus imported tons.

Herd size data are initialised based on the data prepared in ‘//coco1_finish_agriProd.gms//’, taking an average of the available countings related to a calendar year. In the cattle sector we take the weighted average 0.25*December(t-1)+0.5*May-June(t)+0.25*December(t) to assign the average herd size in the calendar year. For dairy cows and suckler cows this average herd size this is also the activity level. The input coefficient for dairy cows ("DCOW.ICOW") and suckler cows ("SCOW.ICOW") reflects the number of slaughtered heads (of cows), in relation to the total herd size of cows with a fall back value in case of missing data of 0.2. The slaughter weight of cows is cows’ meat production divided by slaughtered heads. A particularity is the culling of cows in the UK due to the mad cow disease, because culled cows do not show up in the slaughtering statistics and yet they have top be considered for reasonable replacement rates. This is solved by estimating the total killings of cows (near zero slaughterings + cullings GROF.ICOW) in the period 1996-2005 from typical replacement rates in the pre-crisis period and booking the estimated cullings on losses (LOSF.ICOW for heads, LOSF.BEEF for tons of culled cows).

For cattle other than cows the activity level definition is more complex. In the case of heifers and bulls for fattening, the activity level equals the number of slaughtered heads plus net exports of live animals. If slaughtered heads of heifers and bulls are unavailable, 45% of total cattle slaughterings (net of cow and calves if available) are used as a default value. Heifers for raising will be used to replace dairy and suckler cows, therefore the number of raised heifers (activity level) may be recalculated from cows slaughterings and the change in the cows’ herd size over the next two years. 

In the same manner the number of heifers needed as input (GROF.IHEI) for each year is equal to the activity levels of heifers for raising and heifers for fattening. The number of female calves raised (activity level) in the current year is equals the number of heifers used as inputs in the following year. Similarly the number of young bulls raised equals next year’s production of adult male cattle in heads. In countries with complete statistical data there are only two activity levels that cannot be fully inferred from statistical data alone: As the statistics do not distinguish slaughterings and trade of male and female calves we are using a male share of 51% to estimate the split of male and female calves. This also permist to calculate the total number of calves of each sex needed as input for each year as calves for raising plus calves for fattening and correspondingly the output coefficient of cows. Conversely the output coefficients of calves in terms of beef may be calculated from statistical data on slaughtered calves in tons and heads. 

Herd size data usually may be mapped exactly to particular cattle categories in the CAPRI data base, including the distinction of heifers for raising and for fattening. The only exception is the distinction of the herd size of male and female calves which is assigned according to the estimated split in the related activity levels. Having assigned both the herd size as well as activity levels permits to assign: average process length in days = activity level / herd size * 365. The average process length in turn is related to the daily growth of animals according to another accounting identity: final (live) weight = beginning (live) weight + daily growth * (process length – empty days). This accounting identity will be imposed in the COCO1 estimation procedure, but module coco1_anim assigns bounds (parameters UppLim and LowLim) for the process length such that the implied daily growth values remain in a reasonable range. For heifers there is also an upper bound for the process length for statistical reasons: female animals older than 36 months are classified as “cows”, whether they have calved or not.

Activity levels and slaughter weights for animal types other than cattle are more straightforward to obtain. The herd size of fattenened pigs beyond 20 kg, of piglets up to 20 kg and sows (+ boars) is the average number according to the four possible annual counting (April, May/June, August and December). The number of fattened pigs (flow of animals) equals total slaughtered pigs minus slaughtered sows. The output coefficient (piglets) per sow equals the number of slaughtered pigs plus the increase in the sows herd size. The input coefficient is an estimate of sows slaughterings per sow (inferred from stock data on young sows and the stock change of all sows). The production of pork from pigs for fattening is calculated from total meat production less the pork from sows, assuming that a sow produces 120 kg of meat. 

Two particularities in the pig sector are worth mentioning. The first is that as of 2011 the COCO database includes the herd size of piglets < 20kg (on code PIGL00.HERD) even though there is no explicit activity level “raising of piglets”. Instead the piglets raised are one of the outputs of activity sows with total production of piglets given on code GROF.YPIG. Accordingly we cannot store the process length for raising of piglets in a column for “raising of piglets” but introduce a new code “PIGF.YDAYS” such that in the completed data base we find the relationship PIGF.YDAYS = GROF.YPIG / PIGL00.HERD * 365. Including the piglets turned out useful because it permits to make use of statistical data on the total pigs population which is sometimes available even though pig slaughterings in heads are missing.   

The second pig sector particularity relates to the requirement functions for pigs, stored in the form of a table (//\dat\feed\porkreq.gms//) that relates daily growth to final slaughter weights. For consistency reasons the same table is used to define bounds for the permissible process length. 

In the poultry sector we have herd size data for chicken broilers, turkeys, ducks, and geese (yearly average, mainly from FAO) and hens from Eurostat (average of this and last year’s December counting). The first four give the total herd size of poultry for fattening whereas the herd size of hens also equals the activity level. The output coefficient for eggs relies on usable production from the balance sheets divided by the herd size of hens. A replacement rate of 80% is assumed for laying hens. The activity level of poultry fattening is the difference of total produced poultry heads minus slaughtered hens. The output coefficients and production in terms of meat are straightforward to calculate from here. With activity level and aggregate herd size of poultry for fattening being defined it is possible to calculate the implied process length. The information on the shares of chicken broilers, turkeys, ducks, and geese is used to specify technical bounds for the daily growth and process length. In addition the technical literature also permitted to specify typical empty days for cleaning of stables (or seasonality in the case of geese and ducks). The differentiation of poultry for fattening is only maintained temporarily in COCO1 because it helped to use statistical information for the specification of some technical coefficients that strongly depend on the shares of turkeys. Subsequent CAPRI modules (like CAPREG) will only use the COCO results for the aggregate poultry fattening activity (POUF).

The herd size data for sheep and goats are assigned in the same way as for cattle. The herd size of sheep and goats for milk is at the same time the activity level. The number of slaughtered lambs (sheep and goats) is the total slaughtering number (including net exports of young animals) minus the slaughtering of adults. This estimate for slaughtered lambs in heads also defines the activity level of sheep and goats for fattening. The total output in tons set equal to the meat production. A particularity in the sheep and goat sector is the strong seasonality in some countries. Empty days are specified based on the share of the December counting (sheep in continuous systems) to the May-June counting (sheep in seasonal + continuous systems). These enter the specification of bounds for the process length in sheep and goats fattening.  

**Include file ‘//coco1 assign_AgriPri.gms//’**

Before assigning the prices from p_agriPri to the tareget parameter data 3 issues are addressed:

  * Price differences in the original series between MS suggested that not all series have been already expressed "per nutrient".
  * Prices for dairy products CHES and COCM need aggregation from more specific series
  * Outliers are identified according to limits for plausible differences to the EU average

**Include file ‘//coco1_ candi_EcoAct.gms//’**

Except for Macedonia, which reports EAA data to Eurostat, all other candidate countries receive an EAA initalisation from previously assigned GROF times PRIC. Input positions are assigned based on shares borrowed from an average across selected EU MS. 

**Include file ‘//coco1_eaa.gms//’**

In this file EAA data from Eurostat are assigned from parameter p_ecoAct to data(.), including unit values. For a number of aggregates special assignments are needed to obtain monetary values matching with the aggregates used elsewhere in COCO.

Unit values at producer price are preferably calculated as a quotient from the value at producer price and the quantity as selected from the EAA statistics. However some checks are used to discard grossly implausible (outlier) unit values. 

To serve as a fall back option for the EAA unit values, the previously assigned prices from the p_agriPri parameter are corrected to acknowledge the typical differences between producer prices (UVAP) and selling prices (PRIC). Finally, if price indices are still missing for single items, those from product groups are used. 

Prices for energy positions heating gas EGAS and fuel EFUL may be used to infer quantity variables in CAPREG from value information. A special section takes care for completeness. 

Finally production of non-physical items from the EAA (some outputs like NURS, FLOW and inputs other than heating gas EGAS and fuel EFUL) may be calculated by the quotient of EAA value and a price index. As we will also express the output “quantity” for heterogenous items “other industrial crops” (OIND), “other crops” (OCRO) and “other animal products” (OANI) in values at constant prices (currently 2005), the complete list of non-physical items with quantity information given as values in constant prices is (using the codes from the end of this documentation):

Outputs: NURS,FLOW,SERO,RQUO,NASA,OIND,OCRO,OANI.

Inputs: IPHA,WATR,REPM,REPB,ELEC,ELUB,INPO,PLAP,SEED,SERI.

With coco1_eaa.gms passed, the presumably best raw data are collected on the central parameter data(.), but a few additional completions are possible to inprove the internal consistency of the initialisation before proceeding to the main consolidation steps:

{{::code_p38.png?600|}}

**Include file //‘coco1_resid.gms’//**

This file calculates residuals from the given data for aggregates and sub-positions for crops. The residual activity level and market balance position is defined as a difference between the group level and the sum of individual crops. This calculation is not carried out if there are gaps in some components or if the total is smaller than the sum of given components.

**Include file //‘coco1_cropyields.gms//’**

Yields are evidently calculated for each crop activity by dividing the gross production by the production level for this activity. However, this sub-module also applies a Hodrick-Prescott (HP) filter to smooth out problems with yields from activities with small production areas.  This optimisation program has tight bounds around observed production and area data (± 100 t or ± 100 ha). The HP objective penalises peaks in the data as frequently encountered (partly due to rounding errors) with small areas or quantities. The tight leeway around observed values is irrelevant for moderately important crops in the sense that the result will be almost identical to the original data. For ‘unimportant’ crops, however, the HP filter term will lead to some smoothing of peaks in the data and thus, in general, to more plausible yields for these crops((For example in France, in 2000, 100 ha only represented 0.002% of the soft wheat area, but 100 ha of tobacco represented 16 % of the total area, as tobacco is irrelevant in France. These irrelevant items will be those where unrealistic yields will be frequently found and where deviations from Eurostat data will be acceptable.)).

**Include file //‘coco1_gras.gms’//**

In most countries grass is the most important ‘crop’ in terms of area use yet, often the data on grass areas and production are one of the weakest parts of crop statistics. When relying solely on statistical data, the COCO database frequently showed unbelievable grass yields in some MS. This sub-module assigns grass yields, based on expert knowledge, to be used as priori information together with statistical data in part 2 of the COCO routine. The key information is expert data((These were estimates worked out in September 2006 by Oene Oenema and Gerard Velthof from Alterra, Wageningen, in the context of a service contract for DG-ENV (Integrated measures in agriculture to reduce ammonia emissions, No 070501/2005/422822/MAR/C1) with the participation of EuroCARE.)) on typical grass yields in dry matter for 2002 in all EU-28 MS and WBs. To convert this expert information, for a single year, into expert time series for grass yields, the expert data for 2002 are linked to the yields of activity aggregate cereals, assuming that long run yield growth and yearly fluctuations run approximately in parallel. The yields for pasture, meadows and other fodder on arable land are adjusted accordingly.

**Include file ‘//coco1_landuse.gm//s’**

This file allows to process information from various sources on the same item, in particular areas for various land use items (“LEVL”). In order to handle the different sources, new rows are defined, indicating from which source the information on land use area is coming which is typically only offered for a selected years or a limited period:

  * **LEVAgriProd** - Eurostat national land use data (Eurostat Table: “apro_cpp_luse”, discontinued). As these data are annually available since the 80s and give important land use categories (total area ARTO with inland waters INLW, arable land ARAC, permanent grassland GRAS, forest land FORE, etc) this would be our preferred source, if all series were complete and reliable. 
  * **LEVCLC** - Land use levels derived from Corine Land Cover (CLC) using a transformation matrix to LUCAS in two steps
    * Original Corine Land Cover (44 classes, aggregated to the NUTS2 level((Data for some countries and years affected by evident problems have been removed. For example the 2006 CLC data only covered parts of Greece, hence are no usable to calculate totals at the MS level.)) obtained from JRC, Ispra for 1990, 2000, 2006, 2012. To link the Corine information to the CAPRI  land use classes we used as an interim step so-called contingency tables from CLC to LUCAS categories provided by JRC Ispra at NUTS2 level. This allows to map the Corine classes (like complex cultivation patterns – “complexCultiv”) to the //most probable// land cover class from the LUCAS survey (in the example “complexCultiv” -> annual crops) which may be aggregated then to the CAPRI land use aggregates (annual crops LUCAS -> arable crops, CAPRI code  ARAC). However, while this mapping to the //“most probable”// category in LUCAS preserves the original information as much as possible, it has disadvantages, for example, that certain LUCAS categories like “fallow land” are not mapped at all because they are not the most probable matching LUCAS category for any of the CLC classes. 
    * To acknowledge that the Corine Classes may be mapped to several LUCAS categories we multiplied them with the “profiles”, giving the distribution of each Corine category according to the LUCAS classes. In this case, only 26.7% of the “complexCultiv” area is mapped to annual crops, but 7.3% are mapped to “temporary pastures”, 6.4% to “permanent  grassland with sparse tree/shrub vegetation” and so forth. The transformed Corine data often give the most detailed area coverage and thus assume a role as a kind of fall back information in case that other information is missing.
  * **LEVRegio** - Eurostat regional land use data (Eurostat Table: “agr_r_landuse”, discontinued). Inspite of using the same codes as for the national data, the national totals, aggregated from the NUTS2 regions are not always in line with LEVAgriProd. Furthermore a few categories are missing (no inland waters, no other wooded land). However there are few alternative annual series available to regionalise the national data in CAPREG.
  * **LEVFAO** - Land use data from the resource FAOSTAT domain FIXME ((See [[http://faostat3.fao.org/home/index.html#DOWNLOAD]].)) with annual time series on agricultural land use but also some non agricultural area categories (forest, inland waters, other land, total area).
  * **LEVLucas** – directly using the LUCAS data is an option that has been considered but not implemented in CAPRI so this code is not used at the moment.
  * **LEVLandCov** - Eurostat land cover data for 2009, 2012, 2015 at the MS level. Agricultural land is only distinguished into cropland CROP and grassland GRAS, but 5 nonagricultural areas are neatly aggregating up to the total country (Artificial ARTIF, shrubland (considered similar to “other wooded land” OWL), bare land & wetlands (mapped to “other sparcely vegetated or bare OSPA) and waters WATER.
  * **LEVEnvio** - Eurostat land cover data from the environment section (Table “env_la_luc1” FIXME ((Apparently these data are currently under revision because they are not accessible on the Eurostat website anymore since about June 2012. However they are still accessible (in July 2012) via [[http://eu22.eu/land-use.2/land-use-by-main-category/]].)), discontinued). Total area is classified into about 40 categories, but data are only given for a number of years (1950, 1970, 1980, 1985, 1990, 1995, 2000) and with many gaps, in particular for the subcategories. 
  * **LEVMcpfe** – Data from the Ministerial Conference on the Protection of Forests in Europe C&I database for quantitative indicators. This gives validated data on the forest sector (forest land FORE, other wooded land OWL) and some non forestry data (inland waters INLW, total country area ARTO), but data were only given for 1990, 2000, 2005, 2010, 2015.
  * **LEVFSS** - Eurostat farm structure survey data (Table “ef_lu_ovcropaa"). Gives a very detailed and reliable description of agricultural area use, but only for the survey years (1990, 1993, 1995, 1997, 2000, 2003, 2005, 2007, 2010, 2013). As CAPRI_regLU these data are also used in the subsequent regionalisation steps of the CAPRI data consolidation because NUTS2 data are offered. The main disadvantage for our purposes is the complete lack of nonagricultural data coverage. 
  * **LEVcrf** – The UNFCCC common reporting format (CRF) data (1990-2016), also cover land transitions and give settlement data. Official data for LULUCF accounting. 

These sources each provide information on some “land use classes” (Table 7 of Annex) at least.  These land use classes might be related to agricultural activities (like “olive groves” OLIVGR, covering the activities “tables olives” TABO and “olives for oil” OLIV) or they may refer to nonagricultural land uses (“artificial land” OART). Land use classes are in turn related to land use aggregates (Table 7 of Annex). 

**Include file ‘//coco1_finish_raw.gms//’**

This file includes some final checks and adjustments before moving on to the optimisation part of coco. 
  * For seed quantities technical limits for reasonable seed use per ha are imposed.
  * For all non crop products producer prices are assigned from the EAAP/UVAP positions or PRIC
  * For all products with one of activity level, production or yield missing some correcting actions are taken.
  * For FEDM, HCOM, SEDF and SEDM lower and upper limits are introduced to limit yearly changes in the subsequent estimation routines.


====COCO1 Estimation procedure====
 
COCO was primarily designed to fill gaps or to correct inconsistencies found in statistical data and, additionally, to easily integrate data from non EUROSTAT sources in the model. However, given the task of having to construct consistent time series on yields, market balances, EAA positions and prices for all EU Member States, and therefore thousands of series, a heavy weight was put on a transparent and uniform //econometric solution// so that manual corrections were avoided, to some extent at least. Regarding the construction of the data base, three principal problems had to be solved:

  - Gaps had to be filled in time series, either before the first available point, inside the range where observations are given, or beyond it.
  - Some time series were missing altogether and had to be estimated, e.g. when there are data on animal production but none on meat output per head.
  - Corrections of given statistical data should be minimised, if possible.

In order to take into account logical relation between the time series to fill, and eventually to make minimal corrections in the light of consistency definitions, simultaneous estimation techniques are used in this exercise. In order to use to the greatest extent the information contained in the existing data, the following principles are applied:
  - Accounting identities   positions of the market balance summing up to zero, the difference between stocks as the stock change and similar restrictions  constrain the estimation outcome.
  - Relations between aggregated time series (e.g. total cereal area) and single time series are used as additional restrictions in the estimation process.
  - Bounds for the estimated values based on engineering knowledge or derived from first and second moments of times series ensure plausible estimates and/or bind estimates to original data. Additionally, bounds are constructed from more disaggregated time series, if the aggregate is missing.
  - As many time series as technically possible are estimated simultaneously to use the full extent of the informational content of the data constraints (1) and (2).

The first three points neatly conform to the Bayesian Highest Posterior Density (HPD) approach proposed in Heckelei et al. 2005. The reader may notice that the problem is quite similar to system estimation in economics. Consider a system of supply curves. A standard approach to estimate such a system includes the specification of a functional form consistent with profit maximisation and the imposition of various constraints (homogeneity, symmetry, convexity) on the parameters to be estimated. Our approach is quite similar, as our goal asks for consistent estimates as well. Instead, we introduce explicit data constraints involving the fitted values for each point and take the fitted values later as the content of the data base.

The estimation is prepared in the following steps:
  - Estimate independent trend lines for the time series.
  - Estimate a Hodrick-Prescott filter using given data where available and otherwise the trend estimate as input.
  - Define ‘target values’ which are (a) given data, (b) the results from the Hodrick-Prescott filter times R² plus the last (1-R²) times the average of nearest observations. The target values may be considered modes of a prior distribution.
  - Specify a ‘standard deviation’ for each data point which is different for given data and gaps.

The concept is put to work by a minimisation of normalised least squares under constraints:

\begin{align}
\begin{split}
min_{y_{i,t}} &\sum_{i,t\in obs} wgt^{dat}((y_{i,t}-y_{i,t}^{dat})/abs(y_{i,t}^{trd}-y_{i,t}^{dat}))^2\\
& + \sum_{i,t\notin obs} wgt^{ini}((y_{i,t}-y_{i,t}^{ini})/s_{i,t})^2\\
& + \sum_{i,t} wgt^{hp}((y_{i,t+1}-y_{i,t})-(y_{i,t}-y_{i,t-1})/s_{i,t})^2\\
& + \sum_{i,t} wgt^{up}((max(y_{i,t}^{up},y_{i,t})-y_{i,t}^{up}))/abs(y_{i,t}^{up}))^2\\
& + \sum_{i,t} wgt^{lo}((min(y_{i,t}^{lo},y_{i,t})-y_{i,t}^{lo}))/abs(y_{i,t}^{lo}))^2\\
\end{split}
\end{align}
\begin{align*}
\begin{split}
&  \text {s.t.}\\  
&y_{i,t}^{LO}<y_{i,t}<y_{i,t}^{UP}\\
&\text {Accounting identities defined on} y_{i,t}\\
&\text {Identity of land use from different sources}  
\end{split}
\end{align*}
where //i// represents the index of the elements to estimate (crop production activities or groups, herd sizes etc.), //t// stands for the year, wgtx are weights attached to the different parts of the objective (\(wgt^{dat} = wgt^{hp} = 10, wgt^{ini} = 1, wgt^{up} = wgt^{lo} = 100)\), and


\(y_{i,t}\) =	the fitted value for item i, year t

\(y_{i,t}^{dat}\)	=	the observed data for item i, year t 

\(obs\)	=	{\((i,t) | y_{i,t}^{dat} ≠ 0\)}, the set of data points with nonzero data 

\(y_{i,t}^{trd}\) 	=	the trend value of an initial t rend line through the given data 

\(y_{i,t}^{ini}\) 	=	initial supports for gaps:  preliminary Hodrick-Prescott filter result (from step 2) times R² plus the last    (1-R²) times the average of nearest observations 

\(s_{i,t}, (i,t)\notin obs\)	= \(0.1 \cdot y_{i,t}^{ini} +s_{i,t}^{trd}\) , weighted sum of the initial support for gaps and the standard error of the initialising trend

\(s_{i,t}, (i,t)\in obs\) =	\(0.1 \cdot y_{i,t}^{dat} +s_{i,t}^{trd}\) , weighted sum of given data and the standard error of the initialising trend

\(y_{i,t}^{lo},y_{i,t}^{up}\) 	=	‘soft’ bounds, triggering a high additional penalty if violated 

\(y_{i,t}^{LO},y_{i,t}^{UP}\) 	=	‘hard’ bounds, defining the feasible space 


The general weighing of the different terms evidently reflects the acceptability of certain types of deviations which is lowest ( = 1) for deviations of the fitted value from the HP filter initialisation as these are considered quite poor, preliminary estimates (derived from independent trends). The weights are 10 times higher for deviations from given data and for the smoothing HP filter term. Finally there are extra penalty terms for fitted values moving beyond plausible ‘soft’ bounds \(y_{i,t}^{lo},y_{i,t}^{up}\). The ‘hard’ bounds \(y_{i,t}^{LO},y_{i,t}^{UP}\) are constraining the feasible space for a number of solution attempts. However, if it turns out that certain constraints would persistently preclude feasibility of the data consolidation problem, they are relaxed in a stepwise fashion, but this widening of bounds is monitored on a parameter to check. 

The denominators used to normalise the different terms are ‘standard deviations’ of the prior distribution in the framework of a HPD estimation but they are specified in view of practical considerations. Essentially they provide another weighting for particular (i,t) deviations depending on their acceptability, but these weights are specific to the particular data point. All denominators are derived from the variable in question such that they acknowledge the fact that the means of the time series entering the estimation deviate considerably. The normalisation hence leads to minimisation of relative deviations instead of absolute ones which could not be summed in a reasonable way.
 
It should be mentioned that the above representation of the COCO objective function is a quite simplified one: It is evident that the above lacks safeguards against division by zero or very small values which are included in the GAMS code. Furthermore there are different types of gaps which are not reflected above to avoid clutter (Are there gaps in a series with some data or is the series empty? Is the mean based on data or estimated from  \(y_{i,t}^{lo},y_{i,t}^{up}\) ?)

Equation 4 indicates that accountancy restrictions are added. These restrictions can be balances (land, milk contents, young animals), aggregation conditions, definitions for processing coefficients and yields etc. They are quite similar to those applied for the ex ante trend projections as discussed in detail in Section [[The Regionalised Data Base (CAPREG)]] but the COCO1 accounting identities tend to acknowledge more details or have to establish the data base that is subsequently given for the ex ante trend projections, for example related to the split of high and low yield animal activites (DCOL, DCOH, BULL, BULH, HEIL, HEIH):

{{:code_p.43.png?600|}}

The fixed yield variation imposed in this way is ± 20% and each of the variants corresponds a fixed 50% of the total activity level whereas other accounting equations ensure that the process length DAYS and the daily growth DAILY vary accordingly.

In the **dairy sector** the strategy of an update in 2015 has been to obtain a fairly detailed data consolidation with a distinction of milk processed and dairy products obtained in dairies and on farm, using most of the available data sources. For the subsequent modules this disaggregate description of the dairy sector is consolidated to some extent for further use.

The equation system considers that both in dairy as well as on farm the raw milk used has to be consistent in terms of milk fat and protein with the products obtained: 

\begin{equation}
PCRM_M \cdot \delta_{c,M}= \sum_i (NAGR_i- PRCM_i)\cdot \delta_{c,i}
\end{equation}

where 

\(PRCM\) =	processing of raw milk M or dairy product i (e.g. cheese) 

\(NAGR\) =	products obtained in dairies (e.g. MC100, fresh products, from apro_mk_pobta)

\(c\) =	type of milk content (= FATS, PROT)

\(i\) =	dairy product (e.g. MC100, fresh products, from apro_mk_pobta)

\(\delta\)  = 	average content in dairies

In a similar manner we have balances for milk contents in on farm use of raw milk as well as in the products obtained on farm:

\begin{equation}
(INDM_M+HCOM_M)\cdot \delta_{c,M}= \sum_i FARM_i \cdot \phi_{c,i}
\end{equation}

where 

\(INDM\) =	use of raw milk M on farm for farm cheese, farm butter etc (e.g. MF240-UWM)

\(HCOM\) =	use of raw milk M on farm as drinking milk (MF110-UWM, includes both direct sales as well as home 
consumption)
 
\(FARM\) =	products obtained on farm (e.g. MF110-PRO, MF240-PRO) 

\(\phi\) =	average content on farm 

The content of milk products will typically differ, in particular for the most important product “fresh milk products” (FRMI), as this includes yoghurts etc in dairies but will be dominated by drinking milk on farm. However, to accomodate the important case of drinking milk it is not necessary to have all contents on farm deviating freely from the standard contents in dairies. Instead we require that 

\begin{equation}
CORF_{c,i}\cdot \delta_{c,i}=\phi_{c,i}
\end{equation}

where 

\(CORF\) =	ratio of on farm content to the standard content 

and CORF is contrained to equal to one except that we permit CORF $\neq$ 1 for FRMI.

Production in dairies and on farm may be added to obtain the total production that enters the market balances:

\begin{equation}
MAPR_i=NAGR_i+FARM_i
\end{equation}
\begin{equation}
MAPR_i=HCOM_i+PCRM_i+FEDM_i+NTRD_i
\end{equation}

where

\(MAPR\) =	Marketable production according to the (discontinued) Eurostat market balances (USAP-FRMI from apro_mk_bal_B4410_12) 

or in terms of the commercially marketed quantities only:

\begin{equation}
NAGR_i=(HCOM_i-FARM_i)+PCRM_i+FEDM_i+NTRD_i
\end{equation}

The market balance for the raw milk looks as follows:

\begin{equation}
GROF_M=PRCM_M+HCOM_M+INDM_M+FEDM_M+LOSM_M
\end{equation}

where 

\(FEDM\) =	Feed use of raw milk (apro_mk_farm_MF520_UWM)

\(LOSM\) =	Losses of raw milk (apro_mk_farm_MF600_UWM)

After solving the data consolidation according to the above equations the following rebookings will be useful for subsequent modules:

\begin{equation}
MAPR_i'=NAGR_i
\end{equation}

\begin{equation}
HCOM_i'=HCOM_i-FARM_i
\end{equation}

\begin{equation}
HCOM_M'+FEDM_M'+LOSF_M'=HCOM_M-FDEM_M+LOSM_M+INDM_M
\end{equation}

The first two of the previous equations transform the standard (total) market balances including on farm use and production into “commercial” market balances only which is useful for comparisons with some datasets. The last equation is active for a while already in COCO. It identifies \(HCOM_M'\) = raw milk for direct sales (regardless of in terms of drinking milk or on farm products), feed milk and \(LOSF_M'\) , an aggregate of losses and on farm use of milk by farm households themselves. The original position \(INDM_M\) is basically allocated to a part consumed on farm and that part of direct sales which occurs in processed form (farm cheese, butter...). As the form of on farm consumption is not modelled in CAPRI, items FARM, NAGR, INDM are not passed on to subsequent modules, only LOSF is passed on, because this needs to be accounted for when calculating deliveries to dairies (\(PRCM_M\)). 

Related to **land use** data there are also a number of particularities and details. We have various sources reporting data on the same item (LEVL) that evidently contradict each other before the data consolidatuion. During the consolidation the following equation ensures the identity of land use areas among different sources (LEVCLC, LEVFAO etc):

{{::code_page_45.png?400|}}

Based on the previous constraint all other land related accounting restrictions only have to be checked for the item “LEVL”, while the objective functions minimizes deviation from supports of all sources. Accounting restrictions ensure consistency of crop activities with land use classes and their aggregates.

Complications in the consolidation of land use data are related to the use of UNFCCC data for 6 land use classes (set “LUclass”: CROP, FORE, ARTIF, GRSLND, WETLND, RESLND), because three of the UNFCCC land use classes (GRSLND, WETLND, RESLND) differ conceptually from “related” categories from other data sets. Thus it is only possible to specify some inequnalities and an aggregation condition as constriants:

{{::code_page_45_2.png?600|}}

The last equation illustrates that the land use accounting based on UNFCCC data (introduced in 2015) also involves the land use //changes// (LUCpos) into the 6 LU classes (and a corresponding condition for changes //from// those LU classes).  

It should also be explained that Equation 1 is not applied simultaneously to the whole dataset because the optimisation would take too long. Instead it is applied to subsets of closely related variables:
  - Land use and land balance (Estimation step 1 for preliminary LU results).
  - Crop production (land balance + yields) for all crops simultaneously (Estimation step 2).
  - Production, yields, EAA, market balances for groups of animals like “cattle”  (Estimation step 3). 
  - Crop EAA + market balances for groups of crops, taking production from (2.) as given (Estimation step 4).
  - As the crop level estimation or the other crop completions may have slightly changed aggregate areas, the land use estimation has to be repeated (Estimation step 5).

This procedure has developed as a path dependent compromise between computation time and presumed quality. It starts with an estimation of land use in combination with agricultural land balance, including the land transition between LU classes. This determines the utilisable agricultural area (UAA) and non-agricultural land use. Step 2 distributes crop areas within the fixed UAA from step 1 and estimates crop production and yields. Step 3 only tackles the complete animal sector data (activities, markets, EAA). The crop production is taken as given, when market balance and EAA are estimated for the crops and derived processed products (step 4). However, with all steps completed some final checks may modify the results (e.g. delete tiny activity levels or estimate another crop area from another crop output value and thus change the UAAR). Furthermore the crop estimation may have slightly changed the ratio of cropland to productive grassland. Therefore the accounting identities ensured in steps 1 are not necessarily fulfilled in a strict sence anymore. Hence a final reconciliation of land use is added for full consistency:
 
**Figure 3: Overview on main estimations in for the consolidation of national data in Europe (in coco1.gms)**

{{::figure3.png?600|}}

Results are not always fully satisfactory (perhaps impossible given some raw data). For example the resulting prices (unit values) are far from a priori expectations for a number of series, in particular less important ones. This is because, apart from some additional security checks, unit values are by and large considered a free balancing variable calculated to preserve the identity between largely fixed EAA values and fixed production (in coco1_estimb). The priority for EAA values has been reduced somewhat in recent years but a more thorough revision would require to estimate production, market balances and EAA simultaneously rather than consecutively (first $(a)$, then $(c)$ for crops). As this is infeasible for all crops at the same time the whole estimation would need to be split up differently in the crop sector, perhaps first for the aggregates and then within those.
 
Furthermore it should be mentioned that the main parts of COCO are handled in a program (‘coco1.gms’) looping over MS because there are no direct linkages between them. However, for practical reasons it will be useful to run COCO in country groups that have the same coverage of years. The longest series (as off 1984) can be established for EU15((Belgium and Luxembourg are aggregated in COCO for reasons of data availability.)) countries except Germany. For the New MS it turned out that data before 1989 are often very unreliable and create considerable burden in the data maintenance. These countries (and Germany) are only completed for years from 1989 onwards therefore. Norway also offers reliable series as of 1984. In the case of the Western Balkan countries it is rather hopeless to provide very recent data as key data are still missing such that the series can only be completed from 1995 onwards. Furthermore for the Western Balkan counties it was necessary to transfer certain coefficients and shares from (previously consolidated) neighbouring countries to the Western Balkan, such that a certain sequence is necessary for a reasonable application of COCO1: 

  * Run COCO1 for EU28 countriesand Norway, either in one batch from the GUI or one by one (always with sub-steps 1 to 5). 
  * Run COCO1 for the set of candidate countries (Western Balkan and Turkey) on the reduced time span with given data (1995 – 2009). Because these use some shares and ratios from an average of selected EU28 countries the latter have to be consolidated first. 

====COCO2: Data Preparation====
The data consolidation in COCO2 only covers a few special topics:

  * producer prices of dairy products and vegetable oils
  * consumer prices
  * consumer losses and nutrient intake after losses
  * feed stuff quantities without market balances (by-product, fish emal)
  * loss rates of fodder for preliminary balancing of animal nutrients 
  * corrections of certain LULUCF coefficients based on UNFCCC

An overview is given in the following figure.

**Figure 4: Overview on main elements in the finalisation step for the consolidation of national data in Europe (in coco2.gms)**

{{::figure_4.png?600|}}

In spite of only limited subtasks tackled in coco2.gms, the multitude of different data inputs is comparable to that in COCO1.

**Include file //‘coco2_collect.gms’//**

Various input files are collected with some adjustments to match to CAPRI definitions and with some gap filling. As the consumer prices follow from a top down expenditure allocation problem, the input data range from macroeconomic information to very detailed prices of food items.

  * Consolidated data from COCO1
  * Macroeconomic information from Eurostat and UNSTATS: Exchange rates, population, GDP deflator, private consumption of households in current prices.
  * Price index information: Aggregate food price index, relative (to EU) food price index, harmonised indices of consumer prices (HICPs) with item weights all from Eurostat
  * Expenditure by product groups (from Eurostat and national sources)
  * Auxiliary data for special cases (Prices for some milk products in selected countries, fish meal information etc) 
  * Country Sheets of the Western Balkan and Turkey: Exchange rate, inhabitants, inflation rate, food expenditure shares
  * Disaggregate absolute consumer prices for selected narrowly defined food items (ILO and Eurostat)

Where available, producer prices for milk products were already included from Eurostat statistics (Agricultural prices and price indices) in COCO1. Completeness was not achieved in COCO1, however, because processed dairy products are not part of the EAA. Here we complete some gaps using price information for some Member States and (partly assumed) relationships among dairy product prices and their fat and protein contents. 
Data on total consumer expenditures as well as expentitures by food groups are included from various sources as described in Chapter 2.2.2.5 FIXME, partly extended using general price index information.

Consumer price index weights and price indices for food aggregates (2005=100) are coming from Eurostat tables on HICP. Supplementary information for Albania, Bosnia and Croatia comes from national agencies. The price index weights are used to extend older series on food expenditure by product groups (say “meat”) which have been discontinued (see below under file coco2_shares.gms).

Finally we use very narrowly defined absolute consumer prices (e.g. for spaghetti) and price indices. The earlier years (before 2008) had been provided by ILO which has discontinued this activity. For a subset of those Eurostat offers matching information as “detailed average prices (table prc_dapYY) that has been used to extend the ILO series. These prices are mapped to CAPRI regions, products and units (//‘coco2_ilo_addup.gms’//). 

Price indices for food and non-alcoholic beverages from HICP as well as the general food price index are used to complete the disaggregate ILO prices for single typical food items.  (like “Wheat bread white unsliced not wrapped”) using a Hodrick-Prescott filter and the expectation that their changes should follow the price index informaiton collected. 

Finally another HPD estimator is used to adjust the dissagregate prices to be (somewhat) in line with Eurostat information on relative food price levels across Europe.

**Include file //‘coco2_shares.gms’//**

Expenditure shares are defined and completed top-down using simple OLS estimates against related statistical expenditure information or, as a last fall back option, based on a trend.

The food expenditure share completions start with data from COICOP level 3 giving results on food and non-alcoholic beverages. Further disaggregation relies on historical Eurostat data (HIST), on the above mentioned index weights from HICP and partly national data (Germany and Spain). 

A conveninent expenditure group is potatoes as these expenditure shares may be extrapolated based on COCO1 human consumption multiplied by producer price as regressors for OLS.

====COCO2: Estimation procedure====

**Include file //‘coco2_def.gms’//**

The approach to determine consumer prices is to distribute food expenditure on groups with consumption quantities given from COCO1 results such that endogenous consumer prices link endogenous expenditure with exogenous quantities. Deviations of estimated expenditure and consumer prices from their supports is penalised in an entropy framework. Estimation is done year by year, starting with the most recent year where hard data are usually available to a greater extent than for the oldest years in the database. Including consumer price changes (always relative to the previously solved year) serves to stabilise the results to some extent such that the objective does not only have supports for the consumer prices, but also for their changes. The entropy problem is solved by maximizing:  

\begin{align}
\begin{split}
max_t &- \sum_{m,j,k} CPS_{m,j,2} \cdot HCOM_{m,j,k}/1000/TOFO_{m,t} \\
& \cdot PE_{m,j,k} \cdot log(PE_{m,j,k}/PQ_k)\\
&-\sum_{m,j,k} CPS_{m,j,2} \cdot HCOM_{m,j,k}/1000/TOFO_{m,t}\\
& \cdot PED_{m,j,k} \cdot log(PED_{m,j,k}/PQ_k)\\
&-\sum_{m,FOPOS,k} EXS_{m,FOPOS,2}/TOFO_{m,t}\\
& \cdot PEX_{m,FOPOS,k} \cdot log(PEX_{m,FOPOS,k}/PQ_k)\\
&-\sum_{m,j,k} PFAC_{m,k} \cdot LOG(PFAC_{m,,k}/PQ_k)\cdot 1000\\

\end{split}
\end{align}

where //m// represents the region, //j// the food item with consumer price, FOPOS the food group, //t// stands for the current estimation year, t_1 for the year estimated before and k for the number of support points (=3).

Parameters are
| \(HCOM_{m,j,t}\) |Human consumption, result from COCO1|
| \(UVAD_{m,j,t\_1}\)	|Consumer price from last simulation of year t+1|
|\(CPS_{m,j,k}\)	|Support points for consumer prices |
|\(DCPS_{m,j,k}\)	|Support points for consumer price changes| 
|\(EXS_{m,FOPOS,k}\)	|Support points for group expenditures|
|\(TOFACS_{m,k}\)	|Support points for total food expenditure slack|
|\(PQ_k\)		|A priori probabilities for support points|
|\(TOFO_{m,t}\)	|Total food expenditure |
|and entropy variables||
|\(PE_{m,j,t}\)	|Probability of support points for consumer prices| 
|\(PED_{m,j,t}\)	|Probability of support points for consumer price changes|
|\(CP_{m,j}\)	|Consumer prices|
|\(DCP_{m,j}\)	|Consumer price changes|
|\(PEX_{m,FOPOS,t}\)	|Probability of support points for group expenditure|
|\(PFAC_{m,k}\)	|Probability of support points for food expenditure slack|
|\(EX_{mFOPOS}\)	|Group expenditures|
|\(TOFAC_m\)	|Food expenditure slack|

Constraints are as follows:
Summing up probabilities for support points

\begin{equation}
\sum_{k\forall_{m,j}(CP.L_{m,j}\ge 0\wedge HCOM_{m,j,i}\ge 0)} PE_{m,j,k}=1
\end{equation}

\begin{equation}
\sum_{k\forall_{m,j}(DCPS_{m,j}\ge 0\wedge HCOM_{m,j,i}\ge 0)} PE_{m,j,k}=1
\end{equation}

\begin{equation}
\sum_{k\forall_{m,j}(EX.L_{m,FOPOS}\ge 0)} PE_{m,FOPOS,k}=1
\end{equation}

\begin{equation}
\sum_{k\forall_{m}(TOFAC.LO_m\ge TOFAC.UP_m)} PFAC_{m,k}=1
\end{equation}

Define consumer price changes from support points

\begin{equation}
DCP_{m,j} = \sum_{k\forall_{m,j}(CP.L_{m,j}\ge 0\wedge HCOM_{m,j,i}\ge 0 \wedge DCPS_{m,j,2}\ge 0)} PED_{m,j,k} \cdot DCPS_{m,j,k}
\end{equation}

Of course consumer prices changes are also related to the last simulation result (which is for T+1 due to backward looping)

\begin{equation}
DCP_{m,j} =UVAD_{m,j,t\_1}-CP_{m,j}
\end{equation}

Define consumer prices from support points and probabilities

\begin{equation}
CP_{m,j} = \sum_{k\forall_{m,j}(CP.L_{m,j}\ge 0\wedge HCOM_{m,j,i}\ge 0)} PE_{m,j,k} \cdot CPS_{m,j,k}
\end{equation}

Define group expenditure from support points and probabilities

\begin{equation}
EX_{m,FOPOS} = \sum_{k\forall_{m,j}(EX_{m,FOPOS}\ge 0)} PEX_{m,FOPOS,k} \cdot EXS_{m,FOPOS,k}
\end{equation}

Define total expenditure slack from support points and probabilities

\begin{equation}
TOFAC_m=\sum_{k\forall_{m}(TOFAC.LO_m\ge TOFAC.UP_m)} PFAC_{m,k} \cdot TOFACS_m
\end{equation}

Exhaustion of food expenditure may be relaxed with a slack factor different from one. However, this “last resort” to achieve feasibility in the expenditure allocation problem is limited to years and countries with precarious data and subject to strong penalties.

\begin{equation}
\sum_{FOPOS} EX_{m,FOPOS}=TOFO_{m,t} \cdot TOFAC_{m,k}
\end{equation}

Consistency of group expenditure

\begin{equation}
EX_{m,FOPOS}=\sum_{j\forall_{m,FOPOS}(j\in FOPOS\wedge HCOM_{m,j} \ge 0)}CP_{m,j} \cdot HCOM_{m,j}/1000
\end{equation}

For most countries the exhaustion of total expenditure is the only evident hard constraint (and even this is relaxed in problem cases). However, as the penalties for group expenditure are set high, and furthermore as the range of expenditure supports defines additional implicit hard constraints, the problem may turn out infeasible (typically solved by additional leeway). To meet the expenditure constraints the solver would tend to concentrate deviations from supports on the most important expenditure items while setting the less important items close to their supports. A more balanced distribution of deviations from supports was achieved in practice by weighting all contributons to the overall objective (except the last one for the total expenditure slack) with expected expenditure shares. The weights may be interpreted as expected expenditure shares because supports are specified in a symmetric way such that the central, second (of three) supports, which is used in the objective function, is equal to the expectation.

**Include file //‘coco2_solve.gms’//**

The initialisation, solving, reporting and storage is organised in the next include files with a few elements worth mentioning

  * The initialisation tries to ensure positive consumer margins by the assignments of expected values and by specifying bounds on estimated consumer prices. The reference point for these margins is an average of EU and national prices that reflects the importance of domestic sales vs. imports.
  * Bounds and spread of supports around expected consumer prices are set high for items without ILO style prices (say “table olives” TABO) or where the fit of available price information is questionable (e.g. cabbage prices for “OVEG”).
  * A checking parameter (“p_checks”) permits to check the iniitalisation in case of infeasibilites. The most frequent case observed in the last years is that lower bounds on oils expenditure become binding, suggesting the need for some systematic mismatch of price and expenditure information for this group. 

====COCO2: Final completions====

At this point it may be motivated why there is at all a need for a COCO2 module instead of handling all further topics in COCO1, that is MS by MS. There are basially two motives: 

  * In some cases it is convenient to have the completed COCO1 results of all countries at hand for comparison purposes and in order to achieve a balanced picture across MS. This is the main motive for the assignments of consumer loss rates (Section 3.2.7.1).
  * Whenever averages of consolidated data (from COCO1) across several or all MS are involved, a solution in a loop requires certain sequence (such as first solving for non-candidate countries to form the averages that are input to candidate countries) or is better solved in a new module like COCO2. This applies to the expenditure allocation problem (Section [[The Complete and Consistent Data Base (COCO) for the national scale#COCO2: Data Preparation]]), to completions for certain feedstuffs (Section 3.2.7.2, EU averages used due to the scarcity of data), and to corrections of LULUCF coefficients (Section 3.2.7.3). FIXME

===Assignment of consumer loss rates and nutrient intake per head ===

Since a number of years diet shift scenarios have increase in importance and therefore the plausibility of per capita consumption projectios and hence their starting values, per capita consumption in the data base. A common yardstick to assess plausibility is nutrient (e.g. calorie) consumption per head where the nutrition literature offers guidance in terms of recommendable as well as “observed” consumption. For nutrition issues it is intake, so consumption after losses, which matters, such that the assignment of these loss rates becomes a critical element of the database. The starting values are due to an FAO study and stored in the \dat folder

{{::code_p53.png?600|}}

The aggregate food share (= 1-loss shares) links intake (INHA(i)) to total consumption (sum(i, HCOM(i)*foodSh(i)) / INHA(levl) and is therefore stored in the database as well. 

{{:code_p53_2.png?600|}}

In spite of the FAO study the real loss rates are highly uncertain. Therefore they are reduced if the estimate of calorie intake based on the FAO loss rates strongly falls short of recommendations (most strongly in a set of “low calory regions”). Conversely loss rates are increased, if the estimate of calorie intake based on the FAO loss rates strongly exceeds recommendations (e.g. in Turkey). 

{{:code_p53_3.png?600|}}

=== Completion of feed related data in coco2_feed ===

The first sections of coco2_feed handle completions for certain by-products and other product so far ignored in coco1. These are by-products of the milling and the brewing industry and for corn gluten feed, sugarbeet pulp, manioc and fish meal where the database is completed for market balance positions production, imports, exports and feed. This relies on discontinued Eurostat tables (collected on p_feedAgri) which are extended using national data and external trade data from Comext. After completion the detailed by-products are aggregated to the CAPRI rows FENI (Rich energy fodder imported or industrial) and FPRI (Rich protein fodder imported or industrial). Based on completed data for all feedingstuffs nutrient contents for the CAPRI feed “bulks” (cereal feed FCER, protein feed FPRO etc) are assigned as an aggregate of their components.

These completions are useful as such but they also permit a balancing of (preliminary) total nutrient supply and demand in the animal sector that ultimately serves to adjust loss rates for fodder with the help of a number of include files: 

**Include files //‘feed_decl.gms’// and //‘req_or_man_fcn.gms’//**

These files are not only active in COCO2, but also in CAPREG, and in the baseline calibration of CAPMOD. This “reuse” of the same files in different modules is efficient and ensures consistency, but usually also requires some adaptations of set definitions: 

{{::code_p54.png?600|}}

The previous snippet from coco2_feed gives an example that some sets (RS, R_RAGG) are assigned specifically to ensure functionality in different modules (here COCO2).

As the name should signal file //‘feed_decl.gms’// mainly collects a number of declarations but it also specifies some bounds for process length DAYS and daily growth DAILY that are imposed throughout of CAPRI (example: maximum daily growth for male cattle = 1.5kg/day). The second include file (//‘req_or_man_fnc.gms’//) specifies the requirement functions (with the argument “req” passed on) for animal activities of CAPRI.

Requirement functions are specified that determine:

  * ENNE	Net energy for ruminants as sum of
    * NEL	net energy for lactation (cows, ewes, goats)
    * NEM	net energy for maintenance (cows, calves, bulls, heifers, ewes, goats)
    * NEA	net energy for activity (cows, calves, bulls, heifers, ewes, goats)
    * NEP	net energy for pregnancy (cows)
    * NEG	net energy for growth (calves, bulls, heifers)
  * ENMC	Net energy chicken
  * ENMP	Net energy pigs
  * CRPR   crude protein (all categories) and LISI lysine aminoacid (sows, poultry)
  * DRMA dry matter (all categories with min and max requirements)
  * Various fiber measures (irrelevant for COCO2) 
There are three main sources for these functions:
  * IPCC 2006 guidelines for the estimation of emissions ([[http://www.ipcc-nggip.iges.or.jp/public/2006gl/pdf/4_Volume4/V4_10_Ch10_Livestock.pdf]])
  * Kirchgessner Tierernährng, 7th edition,  1987
  * CAPRI working paper 97-12 ([[http://www.ilr.uni-bonn.de/agpo/publ/workpap/pap97-12.pdf]])

These functions are one the one hand quite complex. They are composed of various parts that finally give the requirements, for example for energy, as a function of various parameters that may be specific to the region (often the final weights, process length, daily growth) or uniform across regions (carcass ratio). In spite of several components these are typically linked in a straightforward fashion as will be illustrated with a relatively easy example (energy for maintenance of heifers for fattening).

As a starting point, the daily growth from COCO is forced into the range defined in //‘feed_decl.gms’//. At the same time regions with a stocking rate above the MS average are assumed to rely on more intensive technologies, such that their daily growth is also above average (but within the range [\(DAILY_{lo},DAILY_{up}\)]). This is irrelevant in COCO (r=MS, no subnational regions) but relevant for CAPREG and CAPMOD calling the same //‘req_or_man_fnc.gms’//: 

\begin{align}
\begin{split}
&dailyIncrease_r^{HEIF}\\
&=min \left[ DAILY_{up}^{HEIF},max\left(DAILY_{lo}^{HEIF},\frac {stockingrate_r} {stockingrate_{MS}} DAILY_{MS}^{HEIF}\right) \right]
\end{split}
\end{align}

The daily increase is then used to determine the process length (rearrangement of equation below with empty days EDAYS = 0)

\begin{align}
\begin{split}

&fatngday_r^{HEIF}\\
&= min \left[ DAYS_{up}^{HEIF},max \left\{ DAYS_{lo}^{HEIF},\frac{BEEF_r^{HEIF}/carcassSh_{HEIF}-startWgt_{HEIF}}{dailyIncrease_r^{HEIF}} \right\} \right]

\end{split}
\end{align}

The daily increase and process length may be conbined to estimate the mean live weight,

\begin{equation}
meanWgt_r^{HEIF}=startWgt_{HEIF}+\frac {dailyIncrease_r^{HEIF}\cdot fatngdays_r^{HEIF}} {2}
\end{equation}

which in turn is the last information to estimate energy requirements for maintenance according to the IPCC guidelines: 

\begin{equation}
NEM_r^{HEIF}=(meanWgt_{HEIF})^{0.75}\cdot 0.322 \cdot fatngdays_r^{HEIF}
\end{equation}

Other energy requirements (for growth and activity) are calculated in a similar fashion as well as those for other animals. Important aspects to note are

  * Fixed bounds for DAYS and DAILY ensure reasonable requirements, but require that the same constraints are anticipated in COCO and CAPREG to avoid inconsistencies. 
  * Regional coefficients are derived from the MS level information

**Include file //‘coco2_gras.gms’//**

With animal requirements specified the results of COCO1 for grass, other fodder and as a last resort cereals might be revised in terms of losses on farm to achieve an acceptable relationship of energy and protein requirements of total herds compared to the intake with feed. For gras and other fodder on arable land the contents may be adjusted in certain limits as well. The corrections do not eliminate the typical oversupply of nutrients compared to the requirements based on the literature, but they should give reasonable starting values for the feed allocation addressed in module CAPREG. 

===Compare COCO1 results with UNFCCC and compute correction factors in coco2_lulufc_carbon===

In COCO1, an assignment of LULUCF effects (totals and per ha) has taken place, mostly relying on IPCC coefficients. These assignments are compared in coco2_lulucf_carbon with the reportings from EU MS to UNFCCC. For forestry and any transitions involving forestry, the standard IPCC reporting appears rather coarse, as it implies, for example, that management of forest land remaining forest has zero carbon effects. By contrast most EU countries report that there is still a considerable gain in biomass from forest management because the forests have not yet achieved a stable state (as implied by IPCC standard methodology).

To pick up the detailed knowledge of management practices, disturbances, age and species structure embededed in the country level UNFCCC reporting the forest management coefficients per ha for the remaining class (FORFOR) have been already adopted in COCO1. Here we also compute correction factors for the default per ha effects from transitions involving forestry. These are ultimately stored on the data(.) array unloaded in the main result file to be used in LULUCF accounting of CAPMOD.    

===Complete prices for vegetable oil in coco2_oil_price===

The EU prices for vegetable oils relevant for biofuel processing functions are assigned using prices from a USDA source. These assignments refer to prices at the wholesale level (relevant for the processing industry), not to consumer prices which have been determined previously.

After this last include file the completions in module COCO2 are finished and the main output file (coco2_output.gdx) is unloaded. This file is loaded in subsequent modules (main use in CAPREG, but also in CAPTRD for nowcasting and in CAPMOD for update of LULUCF coefficients).