Skip to content

[r3.4] ci(docs): auto-update hardware requirements disk sizes from sync CI#21030

Merged
bloxster merged 28 commits into
release/3.4from
docs/auto-disk-sizes
May 13, 2026
Merged

[r3.4] ci(docs): auto-update hardware requirements disk sizes from sync CI#21030
bloxster merged 28 commits into
release/3.4from
docs/auto-disk-sizes

Conversation

@bloxster

@bloxster bloxster commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

Introduces an automated pipeline to keep the disk size figures on the Docusaurus Hardware Requirements page up to date, removing the need for manual edits.

How it works

  1. Measureqa-sync-from-scratch and qa-sync-from-scratch (minimal node) each run a du -sb step just before cleanup, uploading a disk-usage-<chain>-<mode>.txt artifact.
  2. Collectupdate-disk-sizes.yml triggers via workflow_run after successful sync workflows on release/3.*, downloads matching artifacts, verifies that at least one artifact exists, and runs docs/site/scripts/update-disk-sizes.py.
  3. Publish — the workflow commits the updated JSON to a per-base auto branch such as docs/auto/disk-sizes-release-3.4 and opens or updates a draft PR against that same release branch for human review.
  4. Renderhardware-requirements.mdx imports disk-sizes.json and renders the Current Disk Usage column dynamically, so no manual MDX edit is needed after CI measurements update the JSON.

Companion default-branch workflow

GitHub evaluates workflow_run triggers from the workflow file on the default branch. Companion PR #21100 adds the same update-disk-sizes.yml file to main; both copies are intentionally kept byte-for-byte identical. The trigger itself remains scoped to release/3.*, because the Docusaurus docs data being updated lives on release docs branches.

What is automated

Network Mode Source
Ethereum mainnet Full qa-sync-from-scratch (weekly)
Ethereum mainnet Minimal qa-sync-from-scratch (minimal node) (nightly)
Gnosis Chain Full qa-sync-from-scratch (weekly)
Gnosis Chain Minimal qa-sync-from-scratch (minimal node) (nightly)

What is not automated yet

Archive mode disk sizes require measurement from always-on snapshot machines rather than ephemeral CI runners. A follow-up PR can add this once snapshot machine access/outbound push is confirmed with DevOps.

Manual workflow_dispatch accepts run_id, prune_mode, and base_branch inputs; base_branch defaults to release/3.4.

Files changed

File Change
.github/workflows/qa-sync-from-scratch.yml Add measure + upload steps before cleanup
.github/workflows/qa-sync-from-scratch-minimal-node.yml Same
.github/workflows/update-disk-sizes.yml New collector workflow; mirrored by #21100 on main
docs/site/src/data/disk-sizes.json New data file seeded with current Sept 2025 values
docs/site/scripts/update-disk-sizes.py New Python helper to parse artifact bytes, format SI units, and update JSON
docs/site/scripts/test_update_disk_sizes.py Unit tests for update-disk-sizes.py
docs/site/scripts/generate-llms.py Emit fallback when stripping JSX ?? expressions
docs/site/docs/get-started/hardware-requirements.mdx Import JSON, render disk usage cells dynamically, remove static dated note
docs/site/static/llms-full.txt / llms-full.txt Regenerated to reflect dynamic MDX changes

LLM artifact note

Because generate-llms.py strips JSX at generation time, dynamic Current Disk Usage cells render as in llms-full.txt. Resolving the imported disk-sizes.json values during LLM artifact generation remains a follow-up.

🤖 Generated with Claude Code

Adds a CI pipeline that automatically measures and publishes up-to-date
disk usage figures for the hardware requirements docs page.

- qa-sync-from-scratch: measure `du -sb` before data dir cleanup,
  upload per-chain artifact (disk-usage-<chain>-full.txt)
- qa-sync-from-scratch (minimal node): same, artifact named -minimal
- update-disk-sizes.yml: new workflow triggered by workflow_run on both
  sync workflows; downloads artifacts, runs update-disk-sizes.py,
  pushes to docs/auto/disk-sizes branch and opens a draft PR
- docs/site/src/data/disk-sizes.json: single source of truth for disk
  usage values (full/minimal updated by CI; archive remains manual)
- docs/site/scripts/update-disk-sizes.py: merges artifact bytes into
  the JSON, formatting SI units (GB/TB)
- hardware-requirements.mdx: imports disk-sizes.json and renders values
  dynamically; removes the static dated :::info note

Archive node measurements are out of scope for this PR and will be
added in a follow-up once snapshot machine access is confirmed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@bloxster bloxster marked this pull request as draft May 7, 2026 10:03
bloxster and others added 2 commits May 7, 2026 12:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated CI-to-docs pipeline that measures Erigon sync disk usage in QA workflows, writes the results into a JSON data file, and renders the “Current Disk Usage” column on the hardware requirements docs page from that data.

Changes:

  • Add disk-usage measurement + artifact upload steps to the full and minimal “sync from scratch” QA workflows.
  • Add a new update-disk-sizes workflow plus a Python script to collect artifacts and update docs/site/src/data/disk-sizes.json.
  • Update the hardware requirements MDX to render disk usage from disk-sizes.json instead of hard-coded values.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.github/workflows/qa-sync-from-scratch.yml Uploads measured disk usage artifacts for full runs.
.github/workflows/qa-sync-from-scratch-minimal-node.yml Uploads measured disk usage artifacts for minimal runs.
.github/workflows/update-disk-sizes.yml New collector workflow to download artifacts, update JSON, and open/update a draft PR.
docs/site/scripts/update-disk-sizes.py New helper to parse artifact byte counts and update disk-sizes.json.
docs/site/src/data/disk-sizes.json New data source for disk usage values (seeded with manual values).
docs/site/docs/get-started/hardware-requirements.mdx Consumes disk-sizes.json and renders “Current Disk Usage” dynamically.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread docs/site/docs/get-started/hardware-requirements.mdx Outdated
@bloxster bloxster marked this pull request as ready for review May 7, 2026 16:34
@bloxster bloxster added the docs label May 8, 2026
@bloxster

Copy link
Copy Markdown
Collaborator Author

The docs-site / build CI failure here is caused by a pre-existing TypeScript error in the release/3.4 base branch (SidebarsConfig was moved from @docusaurus/types to @docusaurus/plugin-content-docs in Docusaurus 3.x).

This is fixed in #21074, which is currently green and awaiting review. Once #21074 merges into release/3.4, this PR's CI should pass automatically — no changes needed here.

@yperbasis yperbasis changed the title ci(docs): auto-update hardware requirements disk sizes from sync CI [r3.4] ci(docs): auto-update hardware requirements disk sizes from sync CI May 11, 2026

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall: nice direction. Removes manual edits, leans on existing QA sync runs, draft-PR human gate before publishing. The earlier Copilot rounds already fixed the bigger ones (base-branch derivation, accumulating history on the auto branch). A few issues remain — flagging the two that will actually misbehave in production first.

Blocking — these will cause runtime failures

1. workflow_run won't fire — workflow file isn't on the default branch

GitHub evaluates workflow_run triggers against the workflow file on the default branch (here, main). I checked and update-disk-sizes.yml does not exist on origin/main, only on this PR branch. After this merges to release/3.4, neither the weekly full sync nor the nightly minimal-node sync will invoke the new workflow. Only the workflow_dispatch manual path will work.

Fix: also land the workflow file on main (either retarget the base, or open a parallel PR to main ahead of merge). The data file + MDX changes should remain release/3.4-only since that's where the site deploys from.

2. Heredoc terminator is indented — auto-PR body will be mangled

At .github/workflows/update-disk-sizes.yml:161 the EOF line has 16 spaces of indentation. After YAML's | block strips the 10-space common indent for the run: block, bash sees EOF (6 leading spaces). <<'EOF' requires the terminator at column 0, so bash never terminates the heredoc and consumes through the closing ) of $(…). I reproduced this locally with the same indentation pattern:

----- BEGIN BODY -----
      ## Automated disk size update
      …
      EOF
      
----- END BODY -----

Two visible consequences in the auto-created PR:

  • Every body line gets 6 leading spaces, so GFM renders the whole description as an indented code block.
  • A literal EOF line appears in the body.

The gh pr create call still exits 0, so this doesn't hard-fail the job — it just produces a broken PR description.

Fix: dedent both the heredoc body and the EOF terminator to the leftmost shell column. With the surrounding run: | at 8 spaces and content baseline at 10 spaces, the EOF line needs to start at 10 spaces in the YAML so it lands at column 0 in bash. Or sidestep heredocs with a single --body $'…\n…\n…'. Note: <<-EOF only strips leading tabs, not spaces — don't rely on it as a fix.

Notable

3. Update branch name doesn't include the base branch

.github/workflows/update-disk-sizes.yml:79,98 uses docs/auto/disk-sizes regardless of BASE_BRANCH. Once the workflow is on main, scheduled Sunday runs (ref=main) and release-branch push events (ref=release/3.4) will both produce branches with this name. Each force-pushes over the other. The first opened PR's base is pinned at creation; a later run from a different base ends up pushing content rooted at a different base branch. The concurrency.group keying on head_branch only protects against simultaneous runs, not sequential ones.

Suggest: BRANCH="docs/auto/disk-sizes-${BASE_BRANCH//\//-}" so you get docs/auto/disk-sizes-release-3.4 and docs/auto/disk-sizes-main as separate streams.

4. MDX accessor will hard-fail the docs build if a JSON key goes missing

docs/site/docs/get-started/hardware-requirements.mdx:31-35,42-44 uses diskSizes.networks.mainnet.archive.display directly. Today the seed JSON has every key, and the Python updater only mutates existing keys, so this is fine. But if anyone ever drops a chain from disk-sizes.json (e.g. when Polygon support is fully removed), the next docs build dies with an undefined-property error. Cheap defense: a small helper or optional chaining + fallback.

Minor / nitpicks

5. Action version inconsistency

New steps use actions/upload-artifact@v6 vs @v7 already used in the same file (qa-sync-from-scratch-minimal-node.yml uses @v7 at lines 78/100/109/116). The collector uses actions/download-artifact@v4 vs @v8 elsewhere in the repo. Either align or pin via SHA like release.yml does.

6. du -sb quoting + set -euo pipefail

In the new measure step, an unset/missing ERIGON_DATA_DIR would silently produce an empty BYTES var, an empty artifact file, and a Python "could not parse" log on the next run. Cheap insurance: set -euo pipefail at the top of the run block and quote the variable as du -sb "$ERIGON_DATA_DIR".

7. format_bytes boundary precision

docs/site/scripts/update-disk-sizes.py:24-28 — anything ≥ 1 GB but < 1 TB renders as integer-rounded GB. 999.5 GB → "1000 GB" instead of "1.00 TB". Either lower the TB threshold to ~950 GB or use .2f for the GB branch too. Not user-visible until volumes cross the boundary, which mainnet will do within a year.

8. gh pr list --head $BRANCH could match across base branches

If item 3 is fixed by including the base in the branch name, this becomes moot. Otherwise the existence check could match a PR opened against the wrong base.

@bloxster

Copy link
Copy Markdown
Collaborator Author

Post-review fixes summary

After yperbasisβ€²s review and the subsequent Copilot pass, three more issues were addressed in this branch:


1. JSX string literal key fix (7fda376e)

hardware-requirements.mdx was using bare identifiers in optional-chain bracket notation β€" e.g. ?.networks?.[mainnet] β€" which Docusaurus SSG evaluated as variable references, not property keys. This caused a ReferenceError: mainnet is not defined build failure.

Fixed by using explicit string literals: ?.networks?.['"'"'mainnet'"'"'], ?.['"'"'archive'"'"'], ?.['"'"'full'"'"'], ?.['"'"'minimal'"'"'], ?.['"'"'gnosis'"'"'].


2. Unit tests for update-disk-sizes.py (83b9de29)

Added docs/site/scripts/test_update_disk_sizes.py β€" covers the parser, formatter, JSON merge logic, and edge cases (missing artifact, malformed input).


3. generate-llms.py β€" emit β€" fallback for JSX ?? expressions (c5fe7d8a)

llms-full.txt is generated by stripping JSX from MDX files. The dynamic disk-size cells use a nullish-coalescing pattern like:

{diskSizes?.networks?.['"'"'mainnet'"'"']?.['"'"'archive'"'"']?.display ?? '"'"'β€"'"'"'}

The old stripper dropped the entire {...} block, leaving blank table cells in the LLM artifact. The updated _strip_jsx_expr callback now detects ?? '"'"'value'"'"' / ?? "value" patterns and emits the fallback string instead:

| Archive | β€" | 4 TB | 32 GB | 64 GB |

This is Option A (keep the fallback literal). Option B (resolve actual JSON values at generation time) will be addressed in a separate draft PR.

Both docs/site/static/llms-full.txt and root llms-full.txt were regenerated after each of the above fixes (2040cd72, 359eda10, 17cdcc56, 8555f641).


@yperbasis β€" all previously requested changes remain intact; the above are additive fixes for build correctness and LLM artifact quality.

@bloxster

Copy link
Copy Markdown
Collaborator Author

@copilot review

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment thread docs/site/scripts/update-disk-sizes.py Outdated
Comment thread docs/site/scripts/update-disk-sizes.py Outdated

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (post-fix state)

All 8 issues from my prior review — verified fixed

# Issue Status
1 workflow_run not on default branch Addressed via companion #21100 (still REVIEW_REQUIRED, not yet merged). Approach is valid.
2 Heredoc EOF indentation Fixed. update-disk-sizes.yml:161EOF lands at column 0 in shell after YAML | strips the 10-space common indent. Verified against the raw file bytes.
3 Update-branch name collision across bases Fixed. BRANCH="docs/auto/disk-sizes-${BASE_BRANCH//\//-}".
4 MDX accessor hard-failing on missing keys Fixed. All cells use ?. + ?? '—' with string-literal keys (['mainnet']). The string-literal-key follow-up (7fda376) was correct — bare identifiers would have been ReferenceErrors.
5 Action version inconsistency Aligned: my prior claim that the existing files used @v7 was wrong — both qa-sync-from-scratch.yml and the minimal-node variant use @v6 throughout. New steps at @v6 match correctly.
6 du -sb quoting + set -euo pipefail Fixed in both QA workflows.
7 format_bytes GB precision Fixed — .2f for GB. Unit tests cover the 999.9 GB regression case.
8 gh pr list missing --base filter Fixed.

The added unit tests (test_update_disk_sizes.py) and the _strip_jsx_expr fallback handler for ?? 'val' in generate-llms.py are nice additions; both look correct.


New — Notable

A. #21100's workflow file is not identical to this PR's, despite #21100's body claiming otherwise

Diffing the two update-disk-sizes.yml versions:

< on release/3.4 (this PR)              > on main (#21100)
---
  branches:                               branches:
    - release/3.*                           - release/3.*
-   - main                                  (omitted)
  uses: actions/download-artifact@v7    > uses: actions/download-artifact@v8
                                        > - name: Verify artifacts are present
                                        >   run: |
                                        >     if [ -z "$(ls -A ./artifacts ...)" ]; then
                                        >       echo "::error::No disk-usage artifacts..."
                                        >       exit 1
                                        >     fi
  uses: actions/setup-python@v5         > uses: actions/setup-python@v6

Implications:

  • Since workflow_run is evaluated against the file on main, the version that actually fires is #21100's. The release/3.4 copy is dead weight for the trigger path (only reachable via workflow_dispatch from that ref).
  • The branches: filter on main excludes main itself. If QA - Sync from scratch ever runs on main (scheduled cron / dispatch), the auto-update won't fire. Probably intentional (docs deploy from release/3.4) but worth confirming.
  • Two divergent copies is a maintenance trap. Suggest one of:
    • Drop update-disk-sizes.yml from this PR entirely, keep only the QA-workflow + docs changes here. Land the workflow file via #21100 only.
    • Or sync the two so they're actually identical (v8 download-artifact, verify step, setup-python v6, matching branches filter).
  • The "identical to the version in #21030 after all review fixes" claim in #21100's body should be corrected either way.

B. git push --force could clobber human edits to the auto-branch

update-disk-sizes.yml:127 force-pushes the auto-branch on every run. Because concurrency.group serializes runs, the only realistic case where origin advances between fetch and push is a human pushing a manual fix to the open auto-PR (e.g. to nudge a value before merge). The next nightly run will silently wipe that commit.

A plain git push origin "$BRANCH" would be sufficient: each run is exactly one commit ahead of origin/$BRANCH after the fetch+commit, so it'll always be a fast-forward in the happy path, and bail out (rather than clobber) if a human pushed concurrently. Recommend dropping --force.


Minor / nits

C. Manual workflow_dispatch hardcodes BASE_BRANCH=release/3.4

update-disk-sizes.yml:112. Will need a code change when release/3.5 ships. Cheap fix: add a base_branch workflow_dispatch input alongside run_id and prune_mode, default release/3.4.

D. gh pr list --head $BRANCH — variable unquoted

update-disk-sizes.yml:136. ${BASE_BRANCH} is properly quoted on the same line; $BRANCH should be too. Same on line 127 (git push --force origin $BRANCH).

E. set -euo pipefail not added to all multi-line run: blocks

The "Measure disk usage" steps have it (good). The auto-branch-prep, commit-and-push, and create-or-update-PR blocks do not. Worth aligning since you've already adopted the pattern in this PR.

F. Seed JSON's ci_last_updated: "2025-09-01" with all source: "manual"

Cosmetic — the top-level ci_last_updated reads as a CI marker but no CI run has actually occurred at that date. Either drop the field from the seed or rename to last_updated. Not blocking.

G. llms-full.txt will show until manually regenerated

The PR acknowledges this (Option A → follow-up Option B). Fine.

H. Polygon table left static with September 2025 note

Reasonable given Erigon 3.1.* is the last Polygon-supported series.


Verdict

The blocking items from the first round are all genuinely fixed. The companion-PR strategy works.

Before merge:

  1. The #21100-vs-this divergence resolved one way or the other (item A) — the only thing I'd block on.
  2. --force dropped from the push (item B) — strong recommend, not block.

C–H are nits that can land in a follow-up. CI is green on the latest commit.

pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request May 13, 2026
## Summary

Brings the full `docs/site/` Docusaurus tree into `main`, incorporating
all documentation improvements developed against `release/3.4`.

**Scope:** `docs/site/**` + root `llms.txt` / `llms-full.txt` + removal
of superseded `docs/gitbook/` and `docs/gitbook-help/`. No changes to
Go, proto, or any non-docs source files.

### Included — merged to release/3.4

| PR | What |
|----|------|
| [erigontech#20883](erigontech#20883) | Docusaurus
v3 migration — full `docs/site/` tree, Docusaurus config, versioned v3.3
snapshot |
| [erigontech#20263](erigontech#20263) /
[erigontech#20264](erigontech#20264) | All v3.3
docs ported; branch/versioning convention established |
| [erigontech#20978](erigontech#20978) | Mobile
footer fix, SEO meta tags, OG image |
| [erigontech#20991](erigontech#20991) | Self-host
brand fonts (remove Google Fonts / CDN) |
| [erigontech#21000](erigontech#21000) | `llms.txt`
/ `llms-full.txt` generator script + root artifacts |
| [erigontech#21018](erigontech#21018) | May 2026
w19 maintenance — stale flags, broken links, accuracy fixes |
| [erigontech#21045](erigontech#21045) | CI:
docs-only path filter (skip Go jobs, run docs-site build) |
| [erigontech#21063](erigontech#21063) | `trace`
response fields reference + sync-monitoring guidance |
| [erigontech#21074](erigontech#21074) | Regenerate
`llms.txt` after sync-modes update |
| [erigontech#20997](erigontech#20997) | Brand font
consistency fix, installation page UX |

### Included — pending review on release/3.4

| PR | What |
|----|------|
| [erigontech#21030](erigontech#21030) | Automated
disk size pipeline: `update-disk-sizes.py`, `disk-sizes.json`,
`hardware-requirements.mdx` JSX fix, `generate-llms.py` `—` fallback,
unit tests |
| [erigontech#21129](erigontech#21129) | May 2026
w20 maintenance — `--caplin.nat`, `--caplin.columns-keep-slots`, RPC
subscription defaults, `nat.md` Caplin section, log.dir.verbosity
default |

### What changes on `main`

- `docs/site/` added (full Docusaurus tree, current v3.4 + frozen v3.3
snapshot)
- `docs/gitbook/` and `docs/gitbook-help/` removed (superseded by
Docusaurus)
- Root `llms.txt` and `llms-full.txt` updated to Docusaurus-generated
versions

> ⚠️ This PR does **not** remove `docs/gitbook/` yet — that cleanup will
be a separate commit once this PR is reviewed and approved.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: lupin012 <58134934+lupin012@users.noreply.github.com>
@bloxster

Copy link
Copy Markdown
Collaborator Author

Addressed the latest review points in 6e779bc:

  • Synced .github/workflows/update-disk-sizes.yml with companion [main] ci: add update-disk-sizes workflow so workflow_run trigger fires #21100; the two workflow files are now byte-for-byte identical.
  • Kept the workflow_run.branches filter scoped to release/3.*, matching the Docusaurus docs update path rather than running measurements for main.
  • Dropped git push --force and quoted the branch in both git push and gh pr list.
  • Added workflow_dispatch.inputs.base_branch with release/3.4 default instead of hardcoding it in the shell.
  • Added set -euo pipefail to the multi-line shell blocks.

I also mirrored the workflow hardening into #21100 at bb30fc9 so the main-branch trigger copy stays aligned.

Local checks:

  • python3 -c "import yaml, pathlib; yaml.safe_load(pathlib.Path(\".github/workflows/update-disk-sizes.yml\").read_text())"
  • python3 docs/site/scripts/test_update_disk_sizes.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Comment thread .github/workflows/update-disk-sizes.yml Outdated
Comment thread .github/workflows/update-disk-sizes.yml
Comment thread docs/site/scripts/update-disk-sizes.py Outdated
Comment thread docs/site/scripts/generate-llms.py

@yperbasis yperbasis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (post 6e779bc7)

Prior items — all fixed

# Issue Verification
A #21100 divergence diff -u between this PR's and #21100's update-disk-sizes.yml returns empty — byte-identical.
B git push --force could clobber manual edits to the auto-PR Line 145 is now git push origin "$BRANCH".
C BASE_BRANCH=release/3.4 hardcoded for manual dispatch Added workflow_dispatch.inputs.base_branch with release/3.4 default.
D $BRANCH unquoted in git push / gh pr list Quoted everywhere.
E set -euo pipefail missing in several multi-line run: blocks Added to all multi-line shell blocks.

branches: is kept on release/3.* only (no main) consistently across this PR and #21100 — matches the docs-deploy-from-release-branch model. The heredoc fix from the first round survives — EOF at line 180 lands at column 0 in shell after YAML strips the 10-space common indent. CI green.

Two non-blocking nits (follow-up fine)

  1. format_bytes 1000-MB boundaryformat_bytes(999_900_000) returns "1000 MB" because round(999.9) == 1000, instead of promoting to GB. Not user-visible for Erigon datadirs (always TB-or-GB scale) but worth a one-liner guard, e.g.:
    mb = round(b / 1_000_000)
    if mb >= 1000:
        return f"{b / 1_000_000_000:.2f} GB"
    return f"{mb} MB"
  2. Script exits 0 on no-matchupdate-disk-sizes.py returns success when no artifact filenames match disk-usage-*-<mode>.txt. The workflow's Verify artifacts are present step catches the empty-directory case before the script runs, so this only fires if artifacts exist but none match the producer/consumer label contract — i.e. a silent divergence. sys.exit(1) on not updated would surface that loudly. Not critical.

Companion PR

#21100 needs to land alongside this for workflow_run to actually fire (still REVIEW_REQUIRED, CI green).

LGTM — approving.

@bloxster bloxster merged commit f0f6606 into release/3.4 May 13, 2026
23 checks passed
@bloxster bloxster deleted the docs/auto-disk-sizes branch May 13, 2026 14:22
bloxster added a commit that referenced this pull request May 19, 2026
…m main (#21271)

## Summary

Brings the `release/3.4` copy of
`.github/workflows/update-disk-sizes.yml` in line with the `main` copy
(PR #21100), which received several iterations of Copilot review plus a
recent zizmor-driven security pass.

Net behavior is identical — the changes are all internal hardening /
lint cleanup.

### What changed

- **`GITHUB_OUTPUT`** instead of `GITHUB_ENV` for `prune_mode` /
`source_run_id` / `base_branch` propagation. Step outputs aren't
environment variables, so zizmor's `github-env` rule doesn't fire.
- **Input validation** on `workflow_dispatch` inputs (numeric `run_id`,
allowed-char `base_branch`, `full|minimal` `prune_mode`) before any
value is written downstream.
- **Per-step `env:` blocks** instead of template substitution into shell
scripts — no template-injection findings.
- **"Verify artifacts are present"** step that fails fast if the
download yielded an empty directory.
- **Action bumps**: `actions/download-artifact@v7 → @v8`,
`actions/setup-python@v5 → @v6`, aligning with the rest of the repo.
- **`persist-credentials: true`** explicit on the checkout (the later
`git push` needs them).
- **`# zizmor: ignore[dangerous-triggers]`** on the `workflow_run`
trigger with justification — this workflow never checks out the
triggering run's head; `BASE_BRANCH` is always pinned to `release/3.4`
for `workflow_run` events.

### What did **not** change

- The `branches:` filter still lists only `release/3.*`. The `main` copy
lists both `main` and `release/3.*` because it exists specifically to
fire for default-branch QA runs; the release-branch copy intentionally
stays narrower.

### Why now

Zizmor was added to `main`'s lint workflow in #21127 (merged 2026-05-13,
~9h after #21030 merged here). The release/3.4 copy is fine under
release/3.4's current lint config — but if/when zizmor gets backported,
the existing file would fail the same checks that #21100 just fixed on
`main`. This keeps the two copies aligned ahead of that.

### Test plan
- [ ] CI passes on this PR
- [ ] No functional change vs. current release/3.4 copy — `branches:`
filter unchanged, same trigger, same outputs
- [ ] Diff against `main`'s copy after this and #21100 both merge:
single-line difference on the `branches:` filter only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sahil-4555 pushed a commit to Sahil-4555/erigon that referenced this pull request May 19, 2026
…es (erigontech#21100)

## Context

The `update-disk-sizes.yml` workflow uses a `workflow_run` trigger.
GitHub evaluates `workflow_run` events against the workflow file on the
**default branch** (`main`). Without this file on `main`, successful
sync runs on `release/3.*` would not invoke the collector workflow added
by erigontech#21030.

This PR adds the file to `main` so the trigger fires, then iterates on
hardening to keep this copy aligned with the release-branch copy.

## What this PR does

Adds `.github/workflows/update-disk-sizes.yml` to `main`. Behavior is
functionally identical to erigontech#21030's copy on `release/3.4`; the
differences below are pure hardening / lint cleanup, mostly driven by
Copilot review and a zizmor security pass.

### Trigger scope

- `workflow_run` listens for `release/3.*` runs of the QA sync
workflows, **plus** `main` for the scheduled QA runs (so they also feed
the collector).
- The release-branch copy intentionally stays narrower (`release/3.*`
only). The single-line difference between the two files after both land.

### Input safety

- **`workflow_dispatch` inputs validated** before any value is
propagated downstream: numeric `run_id`, allowed-char `base_branch`,
`full|minimal` `prune_mode`. Anything else exits non-zero.
- **`GITHUB_OUTPUT`** used instead of `GITHUB_ENV` for `prune_mode` /
`source_run_id` / `base_branch` propagation. Step outputs are not
environment variables, so zizmor's `github-env` rule doesn't fire on
potential code execution paths.
- **Per-step `env:` blocks** instead of template substitution into shell
scripts — no template-injection findings.

### Robustness

- **"Verify artifacts are present"** step that fails fast if the
download yielded an empty directory (instead of silently producing an
empty JSON update).
- **`persist-credentials: true`** explicit on the checkout (the later
`git push` needs them).
- **Quoted branch variables** throughout the push / PR-lookup paths.
- **`set -euo pipefail`** on multi-line shell blocks.
- **Normal `git push`** instead of `git push --force` for the
auto-update branch.

### Tooling and review

- **Action bumps**: `actions/download-artifact@v7 → @v8`,
`actions/setup-python@v5 → @v6`, aligning with the rest of the repo.
- **`# zizmor: ignore[dangerous-triggers]`** on the `workflow_run`
trigger with justification — this workflow never checks out the
triggering run's head, and `BASE_BRANCH` is pinned per-event (not
derived from untrusted input).
- Addressed all Copilot review threads across the iteration cycle
(`690c6927` — `download-artifact@v8`, env-based handling for
`workflow_dispatch` inputs, `workflow_run.branches` includes `main`).

## Test plan

- [x] CI passes on this PR
- [x] No functional change vs. the existing `release/3.4` copy beyond
the `branches:` filter widening
- [ ] After this and erigontech#21271 both merge: diff the two files — single-line
difference on `branches:` only

Related: erigontech#21030, erigontech#21271

---------

Co-authored-by: Gianni Morselli <gianni.morselli@erigon.tech>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Bloxster <bloxster@proton.me>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants