Skip to content

ci: use artifacts for e2e prep so job retries don't fail on cache eviction#16310

Merged
GermanJablo merged 3 commits into
mainfrom
ci/e2e-prep-artifacts
Apr 17, 2026
Merged

ci: use artifacts for e2e prep so job retries don't fail on cache eviction#16310
GermanJablo merged 3 commits into
mainfrom
ci/e2e-prep-artifacts

Conversation

@GermanJablo

@GermanJablo GermanJablo commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Switches the e2e-prep handoff from actions/cache to actions/upload-artifact + actions/download-artifact, so that re-running an individual failed E2E shard works without having to re-run the entire workflow (or push an empty commit to re-trigger CI, which is what we've been doing).

This is also the pattern recommended by GitHub: cache is for dependencies reused across runs, artifacts are for passing data between jobs within a run. e2e-prep falls squarely into the second category.

Why

The Actions cache for this repo is permanently at its 10 GB limit, so the e2e-prep-<sha> entry gets evicted by LRU within a few hours of being saved. When that happens, any retry of a failed shard fails at the "Restore prepared test environment" step with Failed to restore cache entry, and the only recovery path is re-running the full workflow. Artifacts are scoped to the workflow run and not subject to the 10 GB cache budget, so they survive across attempts.

Real example that motivated this: https://github.com/payloadcms/payload/actions/runs/24524398342/job/71809444756

Notes for reviewers

  • include-hidden-files: true is required because test/node_modules contains .bin, .pnpm, etc. Missing that flag silently produces a broken artifact.
  • Added an explicit "Verify prepared test environment" step with an actionable error, since download-artifact has no fail-on-cache-miss equivalent.
  • Left the default 90 day retention. This repo is public, so artifact storage is free and there's no reason to be aggressive.
  • Upload/download is slightly slower than cache (zip vs zstd), but the difference is in the order of tens of seconds on a job that takes minutes. Worth it for the reliability win.
  • e2e-prep-<sha> cache key is left behind on old runs; it will naturally age out on its own, no cleanup needed.

This does not touch the separate restore-build cache used by .github/actions/setup. That one still has its own polling + full-build fallback, which is orthogonal to this change.

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

📦 esbuild Bundle Analysis for payload

This analysis was generated by esbuild-bundle-analyzer. 🤖
This PR introduced no changes to the esbuild bundle! 🙌

@denolfe denolfe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comment cleanup needed, looks good otherwise.

Comment thread .github/workflows/main.yml Outdated
@GermanJablo GermanJablo merged commit bc590bd into main Apr 17, 2026
329 of 332 checks passed
@GermanJablo GermanJablo deleted the ci/e2e-prep-artifacts branch April 17, 2026 19:42
milamer pushed a commit to milamer/payload that referenced this pull request Apr 20, 2026
…ction (payloadcms#16310)

Switches the `e2e-prep` handoff from `actions/cache` to
`actions/upload-artifact` + `actions/download-artifact`, so that
re-running an individual failed E2E shard works without having to re-run
the entire workflow (or push an empty commit to re-trigger CI, which is
what we've been doing).

This is also the pattern [recommended by
GitHub](https://docs.github.com/en/actions/tutorials/store-and-share-data):
cache is for dependencies reused across runs, artifacts are for passing
data between jobs within a run. `e2e-prep` falls squarely into the
second category.

### Why

The Actions cache for this repo is permanently at its 10 GB limit, so
the `e2e-prep-<sha>` entry gets evicted by LRU within a few hours of
being saved. When that happens, any retry of a failed shard fails at the
"Restore prepared test environment" step with `Failed to restore cache
entry`, and the only recovery path is re-running the full workflow.
Artifacts are scoped to the workflow run and not subject to the 10 GB
cache budget, so they survive across attempts.

Real example that motivated this:
https://github.com/payloadcms/payload/actions/runs/24524398342/job/71809444756

### Notes for reviewers

- `include-hidden-files: true` is required because `test/node_modules`
contains `.bin`, `.pnpm`, etc. Missing that flag silently produces a
broken artifact.
- Added an explicit "Verify prepared test environment" step with an
actionable error, since `download-artifact` has no `fail-on-cache-miss`
equivalent.
- Left the default 90 day retention. This repo is public, so artifact
storage is free and there's no reason to be aggressive.
- Upload/download is slightly slower than cache (zip vs zstd), but the
difference is in the order of tens of seconds on a job that takes
minutes. Worth it for the reliability win.
- `e2e-prep-<sha>` cache key is left behind on old runs; it will
naturally age out on its own, no cleanup needed.

This does not touch the separate `restore-build` cache used by
`.github/actions/setup`. That one still has its own polling + full-build
fallback, which is orthogonal to this change.

Co-authored-by: German Jablonski <GermanJablo@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🚀 This is included in version v3.84.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants