Eurostat offers so-called correspondence tables to follow boundary changes, recoding and relabelling for all NUTS changes since the formalization of the NUTS typology. Unfortunately, these Excel tables do not conform with the requirements of tidy data, and their vocabulary for is not standardized, either. For example, recoding changes are often labelled as recoding, recoding and renaming, code change, Code change, etc.
The data-raw library contains these Excel tables and
very long data wrangling code that unifies the relevant vocabulary of
these Excel files and brings the tables into a single, tidy format ,
starting with the definition NUTS1999. The resulting data
file nuts_changes is included in the regions
package. It already contains the changes that will come into force in
2021.
Let’s review a few changes.
data(nuts_changes)
nuts_changes %>%
mutate ( geo_16 = .data$code_2016,
geo_13 = .data$code_2013 ) %>%
filter ( code_2016 %in% c("FRB", "HU11") |
code_2013 %in% c("FR7", "HU10", "FR24")) %>%
select ( all_of(c("typology", "geo_16", "geo_13", "start_year",
"code_2013", "change_2013",
"code_2016", "change_2016"))
) %>%
pivot_longer ( cols = starts_with("code"),
names_to = 'definition',
values_to = 'code') %>%
pivot_longer ( cols = starts_with("change"),
names_to = 'change',
values_to = 'description') %>%
filter (!is.na(.data$description),
!is.na(.data$code)) %>%
select ( -.data$change ) %>%
knitr::kable ()| typology | geo_16 | geo_13 | start_year | definition | code | description |
|---|---|---|---|---|---|---|
| nuts_level_1 | FRB | NA | 2016 | code_2016 | FRB | new nuts 1 region, identical to ex-nuts 2 region fr24 |
| nuts_level_1 | FRK | FR7 | NA | code_2013 | FR7 | relabelled and recoded |
| nuts_level_1 | FRK | FR7 | NA | code_2016 | FRK | relabelled and recoded |
| nuts_level_1 | NA | FR7 | NA | code_2013 | FR7 | discontinued |
| nuts_level_2 | FRB0 | FR24 | NA | code_2013 | FR24 | recoded and relabelled |
| nuts_level_2 | FRB0 | FR24 | NA | code_2016 | FRB0 | recoded and relabelled |
| nuts_level_2 | HU11 | NA | 2016 | code_2016 | HU11 | new region, equals ex-nuts 3 region hu101 |
| nuts_level_2 | NA | HU10 | NA | code_2013 | HU10 | discontinued; split into new hu11 and hu12 |
You will not find the geo identifier FRB in
any statistical data that was released before France changes its
administrative boundaries and the NUTS2016 boundary
definition came into force. However, as the description says, you may
find historical data elsewhere, in a historical NUTS2-level product for
the FRB CENTRE — VAL DE LOIRE NUTS1
region, because it is identical to the earlier NUTS2 level
region FR24, i.e. Central France, which was known as
Centre for many years before the transition to
NUTS2016. The size and importance of this territorial unit
is more similar to NUTS1 than NUTS2 units.
Because FRB contains only one FRB0, the
earlier FR24, it is technically identified as a NUTS2-level
region, too. You find the same data in the NUTS2 typology.
With statistical products on NUTS2 level, you can simply recode
historical FR24 data to FRB0, since the
aggregation level and the boundaries are not changed. Furthermore, you
can project this data to any NUTS1 level panel either under
the earlier FR2 NUTS1 label, if you use the
old definition, or the new FRB label, if you use the
current NUTS2016 typology.
Let’s see a hypothetical data frame with random variables. (Usually a data frame has no so many issues, so a more detailed example can be constructed this way.)
example_df <- data.frame (
geo = c("FR", "DEE32", "UKI3" ,
"HU12", "DED",
"FRK"),
values = runif(6, 0, 100 ),
stringsAsFactors = FALSE )
recode_nuts(dat = example_df,
nuts_year = 2013) %>%
select ( geo, values, code_2013) %>%
knitr::kable()| geo | values | code_2013 |
|---|---|---|
| FR | 28.919939 | FR |
| UKI3 | 30.154436 | UKI3 |
| DED | 47.977988 | DED |
| FRK | 78.965937 | FR7 |
| HU12 | 63.436662 | NA |
| DEE32 | 1.522288 | NA |
In this hypothetical example we are creating backward compatibility
with the NUTS2013 definition. There are three type of
observations:
NUTS2013 dataset.
recode_nuts(example_df, nuts_year = 2013) %>%
select ( all_of(c("geo", "values", "typology_change", "code_2013")) ) %>%
knitr::kable()| geo | values | typology_change | code_2013 |
|---|---|---|---|
| FR | 28.919939 | unchanged | FR |
| UKI3 | 30.154436 | unchanged | UKI3 |
| DED | 47.977988 | unchanged | DED |
| FRK | 78.965937 | Recoded from FRK [used in NUTS 2016-2021] | FR7 |
| HU12 | 63.436662 | Used in NUTS 2016-2021 | NA |
| DEE32 | 1.522288 | Used in NUTS 1999-2003 | NA |
The first three observations are comparable with a
NUTS2013 dataset. The fourth observation is comparable,
too, but when joining with a NUTS2013 dataset or map, it is
likely that FRK needs to be re-coded to
FR7.
The following data can be joined with a NUTS2013 dataset
or map:
recode_nuts(example_df, nuts_year = 2013) %>%
select ( .data$code_2013, .data$values, .data$typology_change ) %>%
rename ( geo = .data$code_2013 ) %>%
filter ( !is.na(.data$geo) ) %>%
knitr::kable()| geo | values | typology_change |
|---|---|---|
| FR | 28.91994 | unchanged |
| UKI3 | 30.15444 | unchanged |
| DED | 47.97799 | unchanged |
| FR7 | 78.96594 | Recoded from FRK [used in NUTS 2016-2021] |
And re-assuringly these data will be compatible with the next NUTS typology, too!
recode_nuts(example_df, nuts_year = 2021) %>%
select ( .data$code_2021, .data$values, .data$typology_change ) %>%
rename ( geo = .data$code_2021 ) %>%
filter ( !is.na(.data$geo) ) %>%
knitr::kable()| geo | values | typology_change |
|---|---|---|
| FR | 28.91994 | unchanged |
| UKI3 | 30.15444 | unchanged |
| HU12 | 63.43666 | unchanged |
| DED | 47.97799 | unchanged |
| FRK | 78.96594 | unchanged |
What about HU12?
data(nuts_changes)
nuts_changes %>%
select( .data$code_2016, .data$geo_name_2016, .data$change_2016) %>%
filter( code_2016 == "HU12") %>%
filter( complete.cases(.) ) %>%
knitr::kable()| code_2016 | geo_name_2016 | change_2016 |
|---|---|---|
| HU12 | Pest | new region, equals ex-nuts 3 region hu102 |
The description in the correspondence tables clarifies that in fact
historical data may be assembled for HU12 (Pest
county.)
HU-PE)
or for NUTS3 data (as HU102)NUTS1 region data.That will be the topic of a later vignette on aggregation and re-aggregation.
Eurostat data: cite Eurostat.
Administrative boundaries: cite EuroGeographics.
For main developer and contributors, see the package homepage.
This work can be freely used, modified and distributed under the GPL-3 license:
citation("regions")
#>
#> To cite package 'regions' in publications use:
#>
#> Antal D (2021). _regions: Processing Regional Statistics_. R package
#> version 0.1.8, <https://regions.dataobservatory.eu/>.
#>
#> A BibTeX entry for LaTeX users is
#>
#> @Manual{,
#> title = {regions: Processing Regional Statistics},
#> author = {Daniel Antal},
#> year = {2021},
#> note = {R package version 0.1.8},
#> url = {https://regions.dataobservatory.eu/},
#> }For contact information, see the package homepage.