openaddresses icon indicating copy to clipboard operation
openaddresses copied to clipboard

Update CZ countrywide

Open StephanGeorg opened this issue 2 years ago • 5 comments

Update outdated CZ countrywide source with:

  • Website: https://geoportal.cuzk.cz/(S(334vrbeodsqj5roboekmtijl))/Default.aspx?lng=EN&mode=TextMeta&side=dsady_RUIAN_vse&metadataID=CZ-00025712-CUZK_SERIES-MD_RUIAN-CSV-ADR-ST
  • Data: https://atom.cuzk.cz/RUIAN-CSV-ADR-ST/RUIAN-CSV-ADR-ST.xml
  • License: Creative Commons CC BY 4.0 License

ETL script for DuckDB:

LOAD spatial;

CREATE TEMP TABLE ruian AS FROM (
SELECT
  "Kód ADM"
  , "Kód obce"
  , "Název ulice"
  , "Typ SO"
  , "Číslo domovní"
  , "Číslo orientační"
  , "Znak čísla orientačního"
  , "PSČ"
  , "Název obce"
  , "Název MOMC"
  , "Název části obce"
  , coalesce("Název MOMC", "Název obce") as "city"
  , coalesce("Název ulice", "Název části obce") as "street"
  , '' as "number"
  , '' as "postal_code"
  , 0.0::double AS "lat"
  , 0.0::double AS "lon"
  ,  ST_FlipCoordinates(ST_Transform(ST_POINT("Souřadnice X", "Souřadnice Y"), 'EPSG:5513', 'EPSG:4326')) as "geometry"
FROM 
  read_csv('*_ADR.csv', AUTO_DETECT=TRUE)
);

-- Update coordinates
UPDATE ruian SET "lat" = ST_Y("geometry"), "lon" = ST_X("geometry");
-- Update postal code
UPDATE ruian SET "postal_code" = CONCAT(regexp_extract("PSČ", '(\d{3})(\d{2})', 1), ' ', regexp_extract("PSČ", '(\d{3})(\d{2})', 2)) WHERE "PSČ" IS NOT NULL;
-- Update house number
UPDATE ruian SET "number" = NULL;
UPDATE ruian SET "number" = "Číslo domovní" WHERE "Číslo orientační" IS NULL;
UPDATE ruian SET "number" = "Číslo domovní" || '/' || "Číslo orientační" WHERE "Číslo orientační" IS NOT NULL;
UPDATE ruian SET "number" = "Číslo domovní" || '/' || "Číslo orientační" || "Znak čísla orientačního" WHERE "Znak čísla orientačního" IS NOT NULL;
UPDATE ruian SET "number" = CONCAT("Typ SO", ' ', "Číslo domovní") WHERE "Číslo orientační" IS NULL AND "street" = "city" AND "Typ SO" = 'č.ev.';


-- Export data
COPY(SELECT "Kód ADM", street, "number", postal_code, city, lat, lon FROM ruian) TO 'CZ-addresses.csv';

Pls upload data from tmp Google Drive folder to S3 bucket.

StephanGeorg avatar Feb 19 '24 14:02 StephanGeorg

@sbma44 I don't know if you're still interested, but you added the source originally. Maybe you'd like to take a look at the update? If not, totally fine, I just wanted to ask.

UPDATE ruian SET "number" = "Číslo domovní" WHERE "Číslo orientační" IS NULL;
UPDATE ruian SET "number" = "Číslo domovní" || '/' || "Číslo orientační" WHERE "Číslo orientační" IS NOT NULL;
UPDATE ruian SET "number" = "Číslo domovní" || '/' || "Číslo orientační" || "Znak čísla orientačního" WHERE "Znak čísla orientačního" IS NOT NULL;
UPDATE ruian SET "number" = CONCAT("Typ SO", ' ', "Číslo domovní") WHERE "Číslo orientační" IS NULL AND "street" = "city" AND "Typ SO" = 'č.ev.';

StephanGeorg avatar Feb 20 '24 14:02 StephanGeorg