Recoding & Relabelling

Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.

The data-raw library contains these Excel tables and very long data wrangling code that unifies the relevant vocabulary of these Excel files and brings the tables into a single, tidy format , starting with the definition NUTS1999. The resulting data file nuts_changes is included in the regions package. It already contains the changes that will come into force in 2021.

Let’s review a few changes.

data(nuts_changes)

nuts_changes %>%
  mutate ( geo_16 = .data$code_2016, 
           geo_13 = .data$code_2013 ) %>%
  filter ( code_2016 %in% c("FRB", "HU11") | 
             code_2013 %in% c("FR7", "HU10", "FR24")) %>%
  select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
           "code_2013", "change_2013",
           "code_2016", "change_2016")) 
           ) %>%
  pivot_longer ( cols = starts_with("code"), 
                 names_to = 'definition', 
                 values_to = 'code') %>%
  pivot_longer ( cols = starts_with("change"), 
                 names_to = 'change', 
                 values_to = 'description')  %>%
  filter (!is.na(.data$description), 
          !is.na(.data$code)) %>%
  select ( -.data$change ) %>%
  knitr::kable ()

typology	geo_16	geo_13	start_year	definition	code	description
nuts_level_1	FRB	NA	2016	code_2016	FRB	new nuts 1 region, identical to ex-nuts 2 region fr24
nuts_level_1	FRK	FR7	NA	code_2013	FR7	relabelled and recoded
nuts_level_1	FRK	FR7	NA	code_2016	FRK	relabelled and recoded
nuts_level_1	NA	FR7	NA	code_2013	FR7	discontinued
nuts_level_2	FRB0	FR24	NA	code_2013	FR24	recoded and relabelled
nuts_level_2	FRB0	FR24	NA	code_2016	FRB0	recoded and relabelled
nuts_level_2	HU11	NA	2016	code_2016	HU11	new region, equals ex-nuts 3 region hu101
nuts_level_2	NA	HU10	NA	code_2013	HU10	discontinued; split into new hu11 and hu12

You will not find the geo identifier FRB in any statistical data that was released before France changes its administrative boundaries and the NUTS2016 boundary definition came into force. However, as the description says, you may find historical data elsewhere, in a historical NUTS2-level product for the FRB CENTRE — VAL DE LOIRE NUTS1 region, because it is identical to the earlier NUTS2 level region FR24, i.e. Central France, which was known as Centre for many years before the transition to NUTS2016. The size and importance of this territorial unit is more similar to NUTS1 than NUTS2 units.

Because FRB contains only one FRB0, the earlier FR24, it is technically identified as a NUTS2-level region, too. You find the same data in the NUTS2 typology. With statistical products on NUTS2 level, you can simply recode historical FR24 data to FRB0, since the aggregation level and the boundaries are not changed. Furthermore, you can project this data to any NUTS1 level panel either under the earlier FR2 NUTS1 label, if you use the old definition, or the new FRB label, if you use the current NUTS2016 typology.

Let’s see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)

example_df <- data.frame ( 
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED", 
            "FRK"), 
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(dat = example_df, 
            nuts_year = 2013) %>%
  select ( geo, values, code_2013) %>%
  knitr::kable()

geo	values	code_2013
FR	28.919939	FR
UKI3	30.154436	UKI3
DED	47.977988	DED
FRK	78.965937	FR7
HU12	63.436662	NA
DEE32	1.522288	NA

In this hypothetical example we are creating backward compatibility with the NUTS2013 definition. There are three type of observations:

Observations about typologies that did not change. There is not further thing to do to make the data comparable across time.
Typologies which changed their geo codes, but are not affected by boundary changes, i.e. the data is comparable, it is only found at a different geographical label
Typologies that are not comparable, and we cannot compare them meaningfully with a NUTS2013 dataset.

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
  knitr::kable()

geo	values	typology_change	code_2013
FR	28.919939	unchanged	FR
UKI3	30.154436	unchanged	UKI3
DED	47.977988	unchanged	DED
FRK	78.965937	Recoded from FRK [used in NUTS 2016-2021]	FR7
HU12	63.436662	Used in NUTS 2016-2021	NA
DEE32	1.522288	Used in NUTS 1999-2003	NA

The first three observations are comparable with a NUTS2013 dataset. The fourth observation is comparable, too, but when joining with a NUTS2013 dataset or map, it is likely that FRK needs to be re-coded to FR7.

The following data can be joined with a NUTS2013 dataset or map:

recode_nuts(example_df, nuts_year = 2013) %>%
  select ( .data$code_2013, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2013 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()

geo	values	typology_change
FR	28.91994	unchanged
UKI3	30.15444	unchanged
DED	47.97799	unchanged
FR7	78.96594	Recoded from FRK [used in NUTS 2016-2021]

And re-assuringly these data will be compatible with the next NUTS typology, too!

recode_nuts(example_df, nuts_year = 2021) %>%
  select ( .data$code_2021, .data$values, .data$typology_change ) %>%
  rename ( geo = .data$code_2021 ) %>% 
  filter ( !is.na(.data$geo) ) %>%
  knitr::kable()

geo	values	typology_change
FR	28.91994	unchanged
UKI3	30.15444	unchanged
HU12	63.43666	unchanged
DED	47.97799	unchanged
FRK	78.96594	unchanged

What about HU12?

data(nuts_changes) 
nuts_changes %>% 
  select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
  filter( code_2016 == "HU12") %>%
  filter( complete.cases(.) ) %>%
  knitr::kable()

code_2016	geo_name_2016	change_2016
HU12	Pest	new region, equals ex-nuts 3 region hu102

The description in the correspondence tables clarifies that in fact historical data may be assembled for HU12 (Pest county.)

It can be accessed from national LAU sources (as HU-PE) or for NUTS3 data (as HU102)
It can be calculated by deducting the Budapest data from the former Central Hungary NUTS1 region data.

That will be the topic of a later vignette on aggregation and re-aggregation.

Citing the data sources

Eurostat data: cite Eurostat.

Administrative boundaries: cite EuroGeographics.

Citing the regions R package

For main developer and contributors, see the package homepage.

This work can be freely used, modified and distributed under the GPL-3 license:

citation("regions")
#> 
#> To cite package 'regions' in publications use:
#> 
#>   Antal D (2021). _regions: Processing Regional Statistics_. R package
#>   version 0.1.8, <https://regions.dataobservatory.eu/>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {regions: Processing Regional Statistics},
#>     author = {Daniel Antal},
#>     year = {2021},
#>     note = {R package version 0.1.8},
#>     url = {https://regions.dataobservatory.eu/},
#>   }

Contact

For contact information, see the package homepage.

Citations and related work

Citing the data sources

Citing the regions R package

Contact