genesis: allow tests pass GenesisConfig to eth.New() - avoid double read of large genesis#21470
genesis: allow tests pass GenesisConfig to eth.New() - avoid double read of large genesis#21470AskAlexSharov wants to merge 1 commit into
eth.New() - avoid double read of large genesis#21470Conversation
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM \u2014 obviously small/trivial change (22 lines).
eth.New() - avoid double read of large genesis
eth.New() - avoid double read of large genesiseth.New() - avoid double read of large genesis
eth.New() - avoid double read of large genesiseth.New() - avoid double read of large genesis
…ckly (erigontech#21483) ## Problem When a merge-queue run has a hive-eest shard fail, the failing job calls `gh run cancel ${{ github.run_id }}` (added in erigontech#21445). That sends SIGTERM to all in-flight matrix siblings, but the Docker-bound hive simulators take ~20 minutes to actually drain. `ci-gate` is `if: always()` and waits for every `needs` job to reach a terminal state, so the broken PR sits at `AWAITING_CHECKS` for the full drain time — blocking the head of the merge queue. Concrete example from today (PR erigontech#21470 at position #1): - 08:29:57 — `hive-eest / test-hive-eest (paris+shanghai, serial)` fails, calls `gh run cancel 26562610423`, emits the "Merge-queue root-cause failure" annotation from erigontech#21445. - 08:48 (~19 min later) — paris+shanghai-parallel, prague-serial/parallel, cancun-serial/parallel, osaka-parallel, rlp-serial/parallel, and glamsterdam-devnet-parallel were all still `in_progress`. Every other ci-gate child (tests, race-tests, eest-spec-tests, kurtosis, hive, lint, bench, repro, sonar, caplin) had already completed. The bottleneck was specifically the hive-eest matrix siblings. ## Fix ```yaml strategy: fail-fast: ${{ github.event_name == 'merge_group' }} ``` - **In `merge_group`**: first failed shard immediately cancels all siblings at the GitHub API layer — much faster than the `gh run cancel` → SIGTERM → runner-drain path. ci-gate's `needs` reach terminal state in seconds, ci-gate fails, the broken PR is evicted. - **In PR runs**: stays `false`, so authors still see the full failure breakdown across every shard. No regression in PR feedback. ## What's left in place and why The per-job `gh run cancel` step (test-hive-eest.yml lines 311-317) stays. Two reasons: - Matrix `fail-fast` only cancels siblings **within the same matrix** — it doesn't cancel sibling reusable workflows. If a future failure pattern leaks across workflows, `gh run cancel` still covers it. - ci-gate.yml's root-cause annotator (line 188) keys off "the leaf that ran `gh run cancel` successfully" to single out the true root cause among collateral cancellations. Removing the step would silently regress erigontech#21445's attribution. ## Scope choice Only `test-hive-eest.yml` is changed. Other matrix-bearing reusable workflows (`test-all-erigon.yml`, `test-all-erigon-race.yml`, `test-eest-spec.yml`, `test-kurtosis-assertoor.yml`, `test-hive.yml`, `test-bench.yml`) all use `fail-fast: false` too, but none of them were the queue-blocking long pole in this incident. Keeping the patch minimal; we can generalize if another workflow becomes the bottleneck. ## Tradeoff to be aware of Queue runs will now show siblings as `cancelled` instead of `failed` whenever any one shard fails. That's the correct tradeoff in `merge_group` — the goal is fast eviction, not detailed diagnostics; full per-shard breakdown remains available on the PR run. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
taratorio
left a comment
There was a problem hiding this comment.
this PR fails in the merge queue because it introduces a regression
https://github.com/erigontech/erigon/actions/runs/26588728733/job/78341615734
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — trivial, obviously correct cleanup with a small diff.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obviously correct cleanup plus plumbing an already-available Genesis config into eth.New() to avoid an extra genesis read in tests.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obvious fix: thread the provided GenesisConfig through eth.New() for tests and remove an unused GenesisBlock field/read path.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obvious cleanup: pass the provided GenesisConfig into eth.New() to avoid a redundant DB read in tests, plus minor dead-field/lint cleanup.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, mechanical cleanup to reuse the provided GenesisConfig instead of re-reading it, plus removal of an unused field and a narrow lint annotation fix.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small plumbing cleanup: reuse the provided GenesisConfig in tests to avoid the extra DB read, plus a harmless nolint annotation tidy-up.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obvious plumbing fix to pass GenesisConfig through eth.New() and avoid redundant genesis reads in tests.
…ead of large genesis
05a936a to
b42f674
Compare
NewEngineXTestRunnerto allow passconfig.Genesisobject (because it already has it) - to avoid 2nd unmarshal of200 MBgenesis8.86s -> 6.34s