Skip to content

Revamp GitHub Actions workflow for SQL dump and GeoPackages upload#41

Merged
anthonyfok merged 8 commits intomasterfrom
revamp-upload-workflows
Mar 29, 2022
Merged

Revamp GitHub Actions workflow for SQL dump and GeoPackages upload#41
anthonyfok merged 8 commits intomasterfrom
revamp-upload-workflows

Conversation

@anthonyfok
Copy link
Copy Markdown
Member

@anthonyfok anthonyfok commented Mar 28, 2022

This PR replaces gpkg-to-pgdump.yml (Convert GeoPackage files to PostGIS archive) GitHub Actions workflow with upload-sqldump-and-geopackages.yml. Changes include:

  • opendrr-boundaries.dump, which was generated from GeoPackages with GitHub Actions, is no longer built. Instead, the equivalent and more complete (with pre-built indices) opendrr-boundaries.sql, which @wkhchow keeps up-to-date anyway, is uploaded directly.
  • opendrr-boundaries.sql is split into 2GB-chunks before uploading as release assets
  • No longer upload to Amazon S3
  • Keep uploading to Backblaze B2 for now in case the downloading of opendrr-boundaries.sql as GitHub artifact is too slow (probably due to ISP throttling or network congestion)
  • For uploading of GeoPackages, as softprops/action-gh-release@v1 currently fails to upload all > 6.0 GiB of *.gpkg files at once, xresloader/upload-to-github-release@v1 is now used for the task.
  • Bump actions/checkout and actions/upload-artifact from v2 to v3, but actions/cache is held at v2 pending [Bug]Can't restore cache biger than 2G in actions/cache@v3 actions/cache#773
  • Add workflow_dispatch manual trigger for re-upload to specified released tag.

Also, a new refresh-cache.yml workflow is added:

  • It accesses the cache every Sunday at 00:38 or 01:38 Pacific Time to prevent the checked-out Git LFS files in the workflow cache from getting stale and removed.
  • It is kept as a separate workflow file on purpose so that the main upload-sqldump-and-geopackages.yml workflow would not be a victim of GitHub disabling the refresh-cache.yml scheduling workflow.
  • GitHub workflow concurrency (which I named prevent-race-condition) is added to prevent upload-sqldump-and-geopackages.yml and refresh-cache.yml workflow from trying to write to the cache at the same time.

Tested at https://github.com/anthonyfok/boundaries/actions

Fixes #34, fixes #36

anthonyfok and others added 8 commits March 25, 2022 12:18
To be replace with more lightweight workflows, see #36
This replaces gpkg-to-pgdump.yml with the following changes:

- opendrr-boundaries.sql in the git repo is used directly
- opendrr-boundaries.dump is no longer built in the workflow
- opendrr-boundaries.sql is split into 2GB-chunks before uploading as
  release assets (#34)
- No longer uploads to Amazon S3 and Backblaze B2 buckets
- GeoPackages (*.gpkg) are no longer uploaded by this workflow;
  they will be handled by a new upload-geopackages.yml instead

Fixes #34
See also #36
Downloading a large GitHub artifact can sometimes be very slow,
with a 3GB artifact taking over 45 minutes, so keeping a copy on a
Backblaze B2 bucket allows faster download if necessary.
This uploads *.gpkg files as release assets as the old
gpkg-to-pgdump.yml workflow used to do.

As softprops/action-gh-release@v1 currently fails to upload all 6.0 GiB
of GeoPackages at once, xresloader/upload-to-github-release@v1 is now
used for the task.

See #36
This workflow prevents the checked-out Git LFS files in the
workflow cache from getting stale and removed by accessing the cache
every Sunday at 00:38 or 01:38 Pacific Time.
Upstream has updated from Node 12 to Node 16 in these GitHub Actions.

Note: actions/cache v3.0.0 has trouble restoring cache over
2 GiB in size:

    Warning: The value of "length" is out of range.
    It must be >= 0 && <= 2147483647. Received 2419791709

so the upgrade to actions/cache@v3 will have to wait until the
resolution of Issue actions/cache#773 in PR actions/cache#775.
into new file upload-sqldump-and-geopackages.yml.

The 'prevent-race-condition' concurrency is moved from job level
to workflow level in order to prevent workflow cancellation:

    Canceling since a higher priority waiting request
    for 'prevent-race-condition' exists

Proper workflow_dispatch is also implemented in case there is a need to
manually (re)upload the release assets.

Together with the previous commits, the refactoring of the previous
gpkg-to-pgdump.yml GitHub Actions workflow is now complete.

Fixes #36
@anthonyfok anthonyfok added Bug Something isn't working Enhancement New feature or request labels Mar 28, 2022
@anthonyfok anthonyfok added this to the Sprint 55 milestone Mar 28, 2022
@anthonyfok anthonyfok self-assigned this Mar 28, 2022
Copy link
Copy Markdown
Collaborator

@wkhchow wkhchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Will update asset release after PR merged. Will need to rebase changes to current working branch (test_hexbin_unclipped) and PR/merge for another release.

@anthonyfok
Copy link
Copy Markdown
Member Author

Thank you @wkhchow for your review! I will merge this into the master branch now. Hope all goes well! 🤞

@anthonyfok anthonyfok merged commit fb9026d into master Mar 29, 2022
@anthonyfok anthonyfok modified the milestones: Sprint 55, Sprint 54 Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor and fix problems in gpkg-to-pgdump.yml Split opendrr-boundaries.{sql,dump} into 2GB chunks before uploading as release assets

2 participants