[main] ci: add update-disk-sizes workflow so workflow_run trigger fires#21100
Conversation
There was a problem hiding this comment.
Pull request overview
Adds the update-disk-sizes.yml GitHub Actions workflow to main so the workflow_run trigger can be evaluated on the default branch and fire when the QA sync workflows complete on release/3.4.
Changes:
- Introduces
.github/workflows/update-disk-sizes.ymlonmain. - Workflow downloads disk-usage artifacts from a triggering run (or a manually provided run ID), updates the docs JSON, and force-pushes an auto-update branch.
- Creates (or reuses) a draft PR targeting the measured branch.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Mirrored the workflow hardening from #21030 in bb30fc9 so this default-branch trigger copy remains byte-for-byte identical to the release PR copy:
The |
…nc CI (#21030) ## Summary Introduces an automated pipeline to keep the disk size figures on the Docusaurus [Hardware Requirements](https://docs.erigon.tech/get-started/hardware-requirements) page up to date, removing the need for manual edits. ### How it works 1. **Measure** — `qa-sync-from-scratch` and `qa-sync-from-scratch (minimal node)` each run a `du -sb` step just before cleanup, uploading a `disk-usage-<chain>-<mode>.txt` artifact. 2. **Collect** — `update-disk-sizes.yml` triggers via `workflow_run` after successful sync workflows on `release/3.*`, downloads matching artifacts, verifies that at least one artifact exists, and runs `docs/site/scripts/update-disk-sizes.py`. 3. **Publish** — the workflow commits the updated JSON to a per-base auto branch such as `docs/auto/disk-sizes-release-3.4` and opens or updates a **draft PR** against that same release branch for human review. 4. **Render** — `hardware-requirements.mdx` imports `disk-sizes.json` and renders the *Current Disk Usage* column dynamically, so no manual MDX edit is needed after CI measurements update the JSON. ### Companion default-branch workflow GitHub evaluates `workflow_run` triggers from the workflow file on the default branch. Companion PR #21100 adds the same `update-disk-sizes.yml` file to `main`; both copies are intentionally kept byte-for-byte identical. The trigger itself remains scoped to `release/3.*`, because the Docusaurus docs data being updated lives on release docs branches. ### What is automated | Network | Mode | Source | |---------|------|--------| | Ethereum mainnet | Full | `qa-sync-from-scratch` (weekly) | | Ethereum mainnet | Minimal | `qa-sync-from-scratch (minimal node)` (nightly) | | Gnosis Chain | Full | `qa-sync-from-scratch` (weekly) | | Gnosis Chain | Minimal | `qa-sync-from-scratch (minimal node)` (nightly) | ### What is not automated yet Archive mode disk sizes require measurement from always-on snapshot machines rather than ephemeral CI runners. A follow-up PR can add this once snapshot machine access/outbound push is confirmed with DevOps. Manual `workflow_dispatch` accepts `run_id`, `prune_mode`, and `base_branch` inputs; `base_branch` defaults to `release/3.4`. ### Files changed | File | Change | |------|--------| | `.github/workflows/qa-sync-from-scratch.yml` | Add measure + upload steps before cleanup | | `.github/workflows/qa-sync-from-scratch-minimal-node.yml` | Same | | `.github/workflows/update-disk-sizes.yml` | New collector workflow; mirrored by #21100 on `main` | | `docs/site/src/data/disk-sizes.json` | New data file seeded with current Sept 2025 values | | `docs/site/scripts/update-disk-sizes.py` | New Python helper to parse artifact bytes, format SI units, and update JSON | | `docs/site/scripts/test_update_disk_sizes.py` | Unit tests for `update-disk-sizes.py` | | `docs/site/scripts/generate-llms.py` | Emit `—` fallback when stripping JSX `??` expressions | | `docs/site/docs/get-started/hardware-requirements.mdx` | Import JSON, render disk usage cells dynamically, remove static dated note | | `docs/site/static/llms-full.txt` / `llms-full.txt` | Regenerated to reflect dynamic MDX changes | ### LLM artifact note Because `generate-llms.py` strips JSX at generation time, dynamic *Current Disk Usage* cells render as `—` in `llms-full.txt`. Resolving the imported `disk-sizes.json` values during LLM artifact generation remains a follow-up. 🤖 Generated with Claude Code --------- Co-authored-by: Bloxster <bloxster@proton.me> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Gianni Morselli <gianni.morselli@erigon.tech>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
.github/workflows/update-disk-sizes.yml:66
workflow_run.branchesaccepts anyrelease/3.*run, but this step always setsBASE_BRANCH=release/3.4. A successful sync on another release branch would update/open the disk-size PR againstrelease/3.4instead of the branch that was measured; either derive this fromgithub.event.workflow_run.head_branchafter validating it is a release branch, or narrow the trigger torelease/3.4.
echo "BASE_BRANCH=release/3.4" >> $GITHUB_ENV
else
echo "PRUNE_MODE=full" >> $GITHUB_ENV
echo "SOURCE_RUN_ID=${{ github.event.workflow_run.id }}" >> $GITHUB_ENV
echo "BASE_BRANCH=release/3.4" >> $GITHUB_ENV
|
@copilot apply changes based on the comments in this thread |
Agent-Logs-Url: https://github.com/erigontech/erigon/sessions/7d0f47fa-05c9-4ddf-b95f-58da1e72efe4 Co-authored-by: bloxster <40316187+bloxster@users.noreply.github.com>
Applied the requested thread updates in 690c692: switched to |
- Switch from GITHUB_ENV to GITHUB_OUTPUT for prune_mode / source_run_id / base_branch propagation (avoids zizmor github-env rule, which flags every write to GITHUB_ENV as potential code execution). - Pass values to downstream steps via per-step env: blocks instead of template substitution (resolves template-injection warning on the python invocation). - Add zizmor: ignore[dangerous-triggers] on the workflow_run trigger with justification — the workflow never checks out the triggering run's head and BASE_BRANCH is pinned to release/3.4 for workflow_run events. - Set persist-credentials: true explicitly on checkout (we need them for the later git push). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m main (#21271) ## Summary Brings the `release/3.4` copy of `.github/workflows/update-disk-sizes.yml` in line with the `main` copy (PR #21100), which received several iterations of Copilot review plus a recent zizmor-driven security pass. Net behavior is identical — the changes are all internal hardening / lint cleanup. ### What changed - **`GITHUB_OUTPUT`** instead of `GITHUB_ENV` for `prune_mode` / `source_run_id` / `base_branch` propagation. Step outputs aren't environment variables, so zizmor's `github-env` rule doesn't fire. - **Input validation** on `workflow_dispatch` inputs (numeric `run_id`, allowed-char `base_branch`, `full|minimal` `prune_mode`) before any value is written downstream. - **Per-step `env:` blocks** instead of template substitution into shell scripts — no template-injection findings. - **"Verify artifacts are present"** step that fails fast if the download yielded an empty directory. - **Action bumps**: `actions/download-artifact@v7 → @v8`, `actions/setup-python@v5 → @v6`, aligning with the rest of the repo. - **`persist-credentials: true`** explicit on the checkout (the later `git push` needs them). - **`# zizmor: ignore[dangerous-triggers]`** on the `workflow_run` trigger with justification — this workflow never checks out the triggering run's head; `BASE_BRANCH` is always pinned to `release/3.4` for `workflow_run` events. ### What did **not** change - The `branches:` filter still lists only `release/3.*`. The `main` copy lists both `main` and `release/3.*` because it exists specifically to fire for default-branch QA runs; the release-branch copy intentionally stays narrower. ### Why now Zizmor was added to `main`'s lint workflow in #21127 (merged 2026-05-13, ~9h after #21030 merged here). The release/3.4 copy is fine under release/3.4's current lint config — but if/when zizmor gets backported, the existing file would fail the same checks that #21100 just fixed on `main`. This keeps the two copies aligned ahead of that. ### Test plan - [ ] CI passes on this PR - [ ] No functional change vs. current release/3.4 copy — `branches:` filter unchanged, same trigger, same outputs - [ ] Diff against `main`'s copy after this and #21100 both merge: single-line difference on the `branches:` filter only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Bloxster <bloxster@proton.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…21278) ## Summary Addresses [lystopad's review comment](erigontech#21100 (comment)) on erigontech#21100, which merged before it could be addressed. The `Configure git` step in `update-disk-sizes.yml` was using the generic `github-actions[bot]` author. Lystopad's feedback: > Please, use another name here which would clearly points to this workflow. For example something like: > ``` > git config user.name "github-workflow-update-disk-sizes-run-${RUN_ID}" > ``` > Otherwise it would be hard to understand source of the change in target repo. ## What changed `.github/workflows/update-disk-sizes.yml` — `Configure git` step: ```diff - name: Configure git if: steps.diff.outputs.changed == 'true' + env: + RUN_ID: ${{ github.run_id }} run: | set -euo pipefail - git config user.name "github-actions[bot]" + git config user.name "github-workflow-update-disk-sizes-run-${RUN_ID}" git config user.email "github-actions[bot]@users.noreply.github.com" ``` - **Name change**: `github-actions[bot]` → `github-workflow-update-disk-sizes-run-<run-id>`. The auto-update commit now names the specific workflow and the exact run, so anyone investigating the commit in the target repo can jump straight to the run that produced it. - **`RUN_ID` propagated via `env:` block**, consistent with the rest of the workflow's anti-template-injection pattern (no template substitution into shell). - **`user.email` unchanged** — `github-actions[bot]@users.noreply.github.com` still keeps the commit attributed to the bot. Lystopad's suggestion only addressed `user.name`. ## Test plan - [x] YAML syntax check (`yamllint`-style by visual inspection — single `env:` insertion) - [ ] First `update-disk-sizes` workflow run after merge: confirm the produced `chore(docs): auto-update measured disk sizes` commit has author `github-workflow-update-disk-sizes-run-<id>` (where `<id>` matches the workflow run ID linked from the commit annotation) Related: erigontech#21100, erigontech#21271 Co-authored-by: Bloxster <bloxster@proton.me> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Context
The
update-disk-sizes.ymlworkflow uses aworkflow_runtrigger. GitHub evaluatesworkflow_runevents against the workflow file on the default branch (main). Without this file onmain, successful sync runs onrelease/3.*would not invoke the collector workflow added by #21030.This PR adds the file to
mainso the trigger fires, then iterates on hardening to keep this copy aligned with the release-branch copy.What this PR does
Adds
.github/workflows/update-disk-sizes.ymltomain. Behavior is functionally identical to #21030's copy onrelease/3.4; the differences below are pure hardening / lint cleanup, mostly driven by Copilot review and a zizmor security pass.Trigger scope
workflow_runlistens forrelease/3.*runs of the QA sync workflows, plusmainfor the scheduled QA runs (so they also feed the collector).release/3.*only). The single-line difference between the two files after both land.Input safety
workflow_dispatchinputs validated before any value is propagated downstream: numericrun_id, allowed-charbase_branch,full|minimalprune_mode. Anything else exits non-zero.GITHUB_OUTPUTused instead ofGITHUB_ENVforprune_mode/source_run_id/base_branchpropagation. Step outputs are not environment variables, so zizmor'sgithub-envrule doesn't fire on potential code execution paths.env:blocks instead of template substitution into shell scripts — no template-injection findings.Robustness
persist-credentials: trueexplicit on the checkout (the latergit pushneeds them).set -euo pipefailon multi-line shell blocks.git pushinstead ofgit push --forcefor the auto-update branch.Tooling and review
actions/download-artifact@v7 → @v8,actions/setup-python@v5 → @v6, aligning with the rest of the repo.# zizmor: ignore[dangerous-triggers]on theworkflow_runtrigger with justification — this workflow never checks out the triggering run's head, andBASE_BRANCHis pinned per-event (not derived from untrusted input).690c6927—download-artifact@v8, env-based handling forworkflow_dispatchinputs,workflow_run.branchesincludesmain).Test plan
release/3.4copy beyond thebranches:filter wideningbranches:onlyRelated: #21030, #21271