Skip to content

[main] ci: add update-disk-sizes workflow so workflow_run trigger fires#21100

Merged
bloxster merged 11 commits into
mainfrom
docs/auto-disk-sizes-main
May 19, 2026
Merged

[main] ci: add update-disk-sizes workflow so workflow_run trigger fires#21100
bloxster merged 11 commits into
mainfrom
docs/auto-disk-sizes-main

Conversation

@bloxster

@bloxster bloxster commented May 11, 2026

Copy link
Copy Markdown
Collaborator

Context

The update-disk-sizes.yml workflow uses a workflow_run trigger. GitHub evaluates workflow_run events against the workflow file on the default branch (main). Without this file on main, successful sync runs on release/3.* would not invoke the collector workflow added by #21030.

This PR adds the file to main so the trigger fires, then iterates on hardening to keep this copy aligned with the release-branch copy.

What this PR does

Adds .github/workflows/update-disk-sizes.yml to main. Behavior is functionally identical to #21030's copy on release/3.4; the differences below are pure hardening / lint cleanup, mostly driven by Copilot review and a zizmor security pass.

Trigger scope

  • workflow_run listens for release/3.* runs of the QA sync workflows, plus main for the scheduled QA runs (so they also feed the collector).
  • The release-branch copy intentionally stays narrower (release/3.* only). The single-line difference between the two files after both land.

Input safety

  • workflow_dispatch inputs validated before any value is propagated downstream: numeric run_id, allowed-char base_branch, full|minimal prune_mode. Anything else exits non-zero.
  • GITHUB_OUTPUT used instead of GITHUB_ENV for prune_mode / source_run_id / base_branch propagation. Step outputs are not environment variables, so zizmor's github-env rule doesn't fire on potential code execution paths.
  • Per-step env: blocks instead of template substitution into shell scripts — no template-injection findings.

Robustness

  • "Verify artifacts are present" step that fails fast if the download yielded an empty directory (instead of silently producing an empty JSON update).
  • persist-credentials: true explicit on the checkout (the later git push needs them).
  • Quoted branch variables throughout the push / PR-lookup paths.
  • set -euo pipefail on multi-line shell blocks.
  • Normal git push instead of git push --force for the auto-update branch.

Tooling and review

  • Action bumps: actions/download-artifact@v7 → @v8, actions/setup-python@v5 → @v6, aligning with the rest of the repo.
  • # zizmor: ignore[dangerous-triggers] on the workflow_run trigger with justification — this workflow never checks out the triggering run's head, and BASE_BRANCH is pinned per-event (not derived from untrusted input).
  • Addressed all Copilot review threads across the iteration cycle (690c6927download-artifact@v8, env-based handling for workflow_dispatch inputs, workflow_run.branches includes main).

Test plan

Related: #21030, #21271

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the update-disk-sizes.yml GitHub Actions workflow to main so the workflow_run trigger can be evaluated on the default branch and fire when the QA sync workflows complete on release/3.4.

Changes:

  • Introduces .github/workflows/update-disk-sizes.yml on main.
  • Workflow downloads disk-usage artifacts from a triggering run (or a manually provided run ID), updates the docs JSON, and force-pushes an auto-update branch.
  • Creates (or reuses) a draft PR targeting the measured branch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml
Comment thread .github/workflows/update-disk-sizes.yml

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml Outdated
@bloxster

Copy link
Copy Markdown
Collaborator Author

Mirrored the workflow hardening from #21030 in bb30fc9 so this default-branch trigger copy remains byte-for-byte identical to the release PR copy:

  • Added manual base_branch input with release/3.4 default.
  • Dropped git push --force in favor of a normal push.
  • Quoted branch variables in push/PR lookup paths.
  • Added set -euo pipefail to multi-line shell blocks.

The workflow_run.branches filter remains release/3.*, so this default-branch workflow exists for GitHub trigger evaluation but only reacts to the Docusaurus docs release branches.

bloxster added a commit that referenced this pull request May 13, 2026
…nc CI (#21030)

## Summary

Introduces an automated pipeline to keep the disk size figures on the
Docusaurus [Hardware
Requirements](https://docs.erigon.tech/get-started/hardware-requirements)
page up to date, removing the need for manual edits.

### How it works

1. **Measure** — `qa-sync-from-scratch` and `qa-sync-from-scratch
(minimal node)` each run a `du -sb` step just before cleanup, uploading
a `disk-usage-<chain>-<mode>.txt` artifact.
2. **Collect** — `update-disk-sizes.yml` triggers via `workflow_run`
after successful sync workflows on `release/3.*`, downloads matching
artifacts, verifies that at least one artifact exists, and runs
`docs/site/scripts/update-disk-sizes.py`.
3. **Publish** — the workflow commits the updated JSON to a per-base
auto branch such as `docs/auto/disk-sizes-release-3.4` and opens or
updates a **draft PR** against that same release branch for human
review.
4. **Render** — `hardware-requirements.mdx` imports `disk-sizes.json`
and renders the *Current Disk Usage* column dynamically, so no manual
MDX edit is needed after CI measurements update the JSON.

### Companion default-branch workflow

GitHub evaluates `workflow_run` triggers from the workflow file on the
default branch. Companion PR #21100 adds the same
`update-disk-sizes.yml` file to `main`; both copies are intentionally
kept byte-for-byte identical. The trigger itself remains scoped to
`release/3.*`, because the Docusaurus docs data being updated lives on
release docs branches.

### What is automated

| Network | Mode | Source |
|---------|------|--------|
| Ethereum mainnet | Full | `qa-sync-from-scratch` (weekly) |
| Ethereum mainnet | Minimal | `qa-sync-from-scratch (minimal node)`
(nightly) |
| Gnosis Chain | Full | `qa-sync-from-scratch` (weekly) |
| Gnosis Chain | Minimal | `qa-sync-from-scratch (minimal node)`
(nightly) |

### What is not automated yet

Archive mode disk sizes require measurement from always-on snapshot
machines rather than ephemeral CI runners. A follow-up PR can add this
once snapshot machine access/outbound push is confirmed with DevOps.

Manual `workflow_dispatch` accepts `run_id`, `prune_mode`, and
`base_branch` inputs; `base_branch` defaults to `release/3.4`.

### Files changed

| File | Change |
|------|--------|
| `.github/workflows/qa-sync-from-scratch.yml` | Add measure + upload
steps before cleanup |
| `.github/workflows/qa-sync-from-scratch-minimal-node.yml` | Same |
| `.github/workflows/update-disk-sizes.yml` | New collector workflow;
mirrored by #21100 on `main` |
| `docs/site/src/data/disk-sizes.json` | New data file seeded with
current Sept 2025 values |
| `docs/site/scripts/update-disk-sizes.py` | New Python helper to parse
artifact bytes, format SI units, and update JSON |
| `docs/site/scripts/test_update_disk_sizes.py` | Unit tests for
`update-disk-sizes.py` |
| `docs/site/scripts/generate-llms.py` | Emit `—` fallback when
stripping JSX `??` expressions |
| `docs/site/docs/get-started/hardware-requirements.mdx` | Import JSON,
render disk usage cells dynamically, remove static dated note |
| `docs/site/static/llms-full.txt` / `llms-full.txt` | Regenerated to
reflect dynamic MDX changes |

### LLM artifact note

Because `generate-llms.py` strips JSX at generation time, dynamic
*Current Disk Usage* cells render as `—` in `llms-full.txt`. Resolving
the imported `disk-sizes.json` values during LLM artifact generation
remains a follow-up.

🤖 Generated with Claude Code

---------

Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Gianni Morselli <gianni.morselli@erigon.tech>
@bloxster bloxster requested review from Copilot and yperbasis May 13, 2026 15:51

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

.github/workflows/update-disk-sizes.yml:66

  • workflow_run.branches accepts any release/3.* run, but this step always sets BASE_BRANCH=release/3.4. A successful sync on another release branch would update/open the disk-size PR against release/3.4 instead of the branch that was measured; either derive this from github.event.workflow_run.head_branch after validating it is a release branch, or narrow the trigger to release/3.4.
            echo "BASE_BRANCH=release/3.4" >> $GITHUB_ENV
          else
            echo "PRUNE_MODE=full" >> $GITHUB_ENV
            echo "SOURCE_RUN_ID=${{ github.event.workflow_run.id }}" >> $GITHUB_ENV
            echo "BASE_BRANCH=release/3.4" >> $GITHUB_ENV

Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml
@bloxster

Copy link
Copy Markdown
Collaborator Author

@copilot apply changes based on the comments in this thread

Copilot AI requested a review from lystopad as a code owner May 15, 2026 05:01

Copilot AI commented May 15, 2026

Copy link
Copy Markdown
Contributor

@copilot apply changes based on the comments in this thread

Applied the requested thread updates in 690c692: switched to actions/download-artifact@v8, added safe env-based/validated handling for manual-dispatch inputs before writing GITHUB_ENV, and expanded workflow_run.branches to include main so scheduled QA runs can trigger this collector.

bloxster and others added 2 commits May 19, 2026 09:35
- Switch from GITHUB_ENV to GITHUB_OUTPUT for prune_mode / source_run_id /
  base_branch propagation (avoids zizmor github-env rule, which flags every
  write to GITHUB_ENV as potential code execution).
- Pass values to downstream steps via per-step env: blocks instead of
  template substitution (resolves template-injection warning on the
  python invocation).
- Add zizmor: ignore[dangerous-triggers] on the workflow_run trigger with
  justification — the workflow never checks out the triggering run's head
  and BASE_BRANCH is pinned to release/3.4 for workflow_run events.
- Set persist-credentials: true explicitly on checkout (we need them for
  the later git push).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread .github/workflows/update-disk-sizes.yml
Comment thread .github/workflows/update-disk-sizes.yml
bloxster added a commit that referenced this pull request May 19, 2026
…m main (#21271)

## Summary

Brings the `release/3.4` copy of
`.github/workflows/update-disk-sizes.yml` in line with the `main` copy
(PR #21100), which received several iterations of Copilot review plus a
recent zizmor-driven security pass.

Net behavior is identical — the changes are all internal hardening /
lint cleanup.

### What changed

- **`GITHUB_OUTPUT`** instead of `GITHUB_ENV` for `prune_mode` /
`source_run_id` / `base_branch` propagation. Step outputs aren't
environment variables, so zizmor's `github-env` rule doesn't fire.
- **Input validation** on `workflow_dispatch` inputs (numeric `run_id`,
allowed-char `base_branch`, `full|minimal` `prune_mode`) before any
value is written downstream.
- **Per-step `env:` blocks** instead of template substitution into shell
scripts — no template-injection findings.
- **"Verify artifacts are present"** step that fails fast if the
download yielded an empty directory.
- **Action bumps**: `actions/download-artifact@v7 → @v8`,
`actions/setup-python@v5 → @v6`, aligning with the rest of the repo.
- **`persist-credentials: true`** explicit on the checkout (the later
`git push` needs them).
- **`# zizmor: ignore[dangerous-triggers]`** on the `workflow_run`
trigger with justification — this workflow never checks out the
triggering run's head; `BASE_BRANCH` is always pinned to `release/3.4`
for `workflow_run` events.

### What did **not** change

- The `branches:` filter still lists only `release/3.*`. The `main` copy
lists both `main` and `release/3.*` because it exists specifically to
fire for default-branch QA runs; the release-branch copy intentionally
stays narrower.

### Why now

Zizmor was added to `main`'s lint workflow in #21127 (merged 2026-05-13,
~9h after #21030 merged here). The release/3.4 copy is fine under
release/3.4's current lint config — but if/when zizmor gets backported,
the existing file would fail the same checks that #21100 just fixed on
`main`. This keeps the two copies aligned ahead of that.

### Test plan
- [ ] CI passes on this PR
- [ ] No functional change vs. current release/3.4 copy — `branches:`
filter unchanged, same trigger, same outputs
- [ ] Diff against `main`'s copy after this and #21100 both merge:
single-line difference on the `branches:` filter only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bloxster bloxster added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit d0c8dd4 May 19, 2026
68 checks passed
@bloxster bloxster deleted the docs/auto-disk-sizes-main branch May 19, 2026 12:09
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 20, 2026
…21278)

## Summary

Addresses [lystopad's review
comment](erigontech#21100 (comment))
on erigontech#21100, which merged before it could be addressed.

The `Configure git` step in `update-disk-sizes.yml` was using the
generic `github-actions[bot]` author. Lystopad's feedback:

> Please, use another name here which would clearly points to this
workflow. For example something like:
> ```
> git config user.name "github-workflow-update-disk-sizes-run-${RUN_ID}"
> ```
> Otherwise it would be hard to understand source of the change in
target repo.

## What changed

`.github/workflows/update-disk-sizes.yml` — `Configure git` step:

```diff
       - name: Configure git
         if: steps.diff.outputs.changed == 'true'
+        env:
+          RUN_ID: ${{ github.run_id }}
         run: |
           set -euo pipefail
-          git config user.name "github-actions[bot]"
+          git config user.name "github-workflow-update-disk-sizes-run-${RUN_ID}"
           git config user.email "github-actions[bot]@users.noreply.github.com"
```

- **Name change**: `github-actions[bot]` →
`github-workflow-update-disk-sizes-run-<run-id>`. The auto-update commit
now names the specific workflow and the exact run, so anyone
investigating the commit in the target repo can jump straight to the run
that produced it.
- **`RUN_ID` propagated via `env:` block**, consistent with the rest of
the workflow's anti-template-injection pattern (no template substitution
into shell).
- **`user.email` unchanged** —
`github-actions[bot]@users.noreply.github.com` still keeps the commit
attributed to the bot. Lystopad's suggestion only addressed `user.name`.

## Test plan

- [x] YAML syntax check (`yamllint`-style by visual inspection — single
`env:` insertion)
- [ ] First `update-disk-sizes` workflow run after merge: confirm the
produced `chore(docs): auto-update measured disk sizes` commit has
author `github-workflow-update-disk-sizes-run-<id>` (where `<id>`
matches the workflow run ID linked from the commit annotation)

Related: erigontech#21100, erigontech#21271

Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants