txnprovider/txpool: drop and kick peers on blob KZG verify failure#21421
Merged
Conversation
…G verify failure GHSA-f4rj-p5qm-7m36. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Mitigates a CPU DoS vector (GHSA-f4rj-p5qm-7m36) by short-circuiting remote batch validation after the first blob KZG verification failure, while preserving per-tx validation semantics for local submissions.
Changes:
- Short-circuit
validateTxnsfor remote batches after the firstUnmatchedBlobTxExt. - Ensure
goodTxnsexcludes trailing (unchecked) remote slots after short-circuit. - Add a regression test covering remote short-circuiting vs local per-tx behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| txnprovider/txpool/pool.go | Adds remote-batch short-circuit logic and limits the “goodTxns” collection loop to checked transactions. |
| txnprovider/txpool/pool_blob_kzg_shortcircuit_test.go | Adds a focused test to assert remote short-circuit behavior and local per-tx semantics. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Plumb (peerID, sentry) through AddRemoteTxnsFromPeer so processRemoteTxns can call PenalizePeer(Kick) on each UnmatchedBlobTxExt slot, matching the across-batch half of geth fdfd1235ac8ed9e1ed95a8db3e7ad0c54f4eb45f. GHSA-f4rj-p5qm-7m36. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify that trailing reasons[i]=NotSet after a remote-batch UnmatchedBlobTxExt short-circuit means "not validated" rather than "accepted". Caller must also consult goodTxns to disambiguate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
processRemoteTxns runs under p.lock; calling sentry.PenalizePeer inline would block every other pool operation on sentry I/O. Collect the deduped offenders while still under the lock (using a [64]byte map key to avoid string allocation per peer) and fire PenalizePeer from a goroutine — the kick is best-effort and no caller depends on its completion. Test syncs via a channel signaled from the mock's DoAndReturn. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold AddRemoteTxnsFromPeer back into AddRemoteTxns now that fetch.go is the only caller — no need for the two-name split. Drop the [64]byte-keyed dedup map in kickKZGOffenders: the validateTxns short-circuit guarantees at most one UnmatchedBlobTxExt per pass, so the dedup was dead code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous regen invoked mockgen with the full import path, leaving the header out of sync with pool.go's //go:generate directive (which uses `. Pool`). Re-regenerate via `go generate` so the header matches the directive and future regenerations are stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
taratorio
approved these changes
May 27, 2026
pull Bot
pushed a commit
to Dustin4444/erigon
that referenced
this pull request
May 28, 2026
…phemeral merge-queue ref (erigontech#21447) ## Problem In the merge queue, hive / hive-eest jobs intermittently fail with **all clients failed to build**: ``` Cloning: erigontech/erigon - gh-readonly-queue/main/pr-XXXXX-<sha> fatal: Remote branch gh-readonly-queue/main/pr-XXXXX-<sha> not found in upstream origin → image build failed (128) → "too few tests" → ci-gate failure ``` `test-hive.yml` / `test-hive-eest.yml` build the hive erigon client via `clients/erigon/Dockerfile.git`, which runs `git clone --depth 1 --branch $tag https://github.com/$github`. For `merge_group` events `GITHUB_HEAD_REF` is empty, so `$tag` falls back to `${GITHUB_REF#refs/heads/}` = the **ephemeral** `gh-readonly-queue/...` ref. When the queue re-forms the group or a runner is slow to start, GitHub deletes that ref before the in-container clone runs → the clone fails. This evicts PRs from the merge queue (e.g. erigontech#21421), independent of the failing PR's code. `actions/checkout` doesn't hit this — GitHub reliably provides the `merge_group` commit to the run; only the *independent* `git clone --branch <ephemeral-ref>` inside hive's builder races the ref's deletion. ## Fix Build the erigon image from the **checked-out commit** and wrap it with hive's default *prebuilt-image* `Dockerfile` (`FROM $baseimage:$tag`), in both hive workflows: - check out the full erigon source into a fresh **`erigon-full`** path. The old sparse `erigon-src` path leaves a stale `.git/info/sparse-checkout` on the **reused** self-hosted hive runners; a full checkout into that same path doesn't clear it, so the tree stays sparse, `cmd/erigon` is missing, and the build fails (`cd ./cmd/erigon: No such file or directory`). A fresh path no sparse run has touched avoids that. - plain **`docker build -t hive/erigon:cilocal`** on the host daemon (persistent layer cache on the reused runners) — *not* a shared `type=gha` cache, so no concurrent-writer 504s and no cross-workflow scope contention (the failure mode `test-kurtosis-assertoor.yml` centralizes its build to avoid). - point hive's default client Dockerfile at that local image (`baseimage=hive/erigon`, `tag=cilocal`) instead of `mv`-ing in `Dockerfile.git`. - drop the now-unused `SOURCE_REPO` / `branch_name` / clone-arg / builder-Go-version plumbing. `-docker.pull` defaults to false (CI doesn't pass it), so `FROM hive/erigon:cilocal` resolves from the local daemon, never force-pulled. erigon's image is `debian:13-slim` with `erigon` on PATH, so hive's default Dockerfile (`apt-get`, `erigon --version`) layers on top cleanly — on both `ethereum/hive` and the `erigontech/hive` fork used by hive-eest. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes
GHSA-f4rj-p5qm-7m36. Two pieces, one commit each:1. Within-batch break in
validateTxns— once one remote (IsLocal=false) blob txn fails KZG verify, the per-tx loop stops; trailing slots are excluded fromgoodTxnsvia acheckedCountcap. Caps per-batch attacker CPU at one verify regardless of N. Local txns keep per-tx semantics, so aneth_sendRawTransactionbatch with one bad tx still validates the rest.2. Peer-kick in
processRemoteTxns—AddRemoteTxnsFromPeer(ctx, slots, peerID, sentry)retains the source peer per slot inunprocessedRemotePeers. AftervalidateTxns,kickKZGOffenderscallsPenalizePeer(Kick)on eachUnmatchedBlobTxExtslot's peer (deduped, at most once per pass). Mirrors the across-batch half of gethfdfd1235ac8ed9e1ed95a8db3e7ad0c54f4eb45f. Together with (1) this eliminates both the per-batch amplification and the cross-batch flood.Scope kept to KZG verify failure (
UnmatchedBlobTxExt) only — fetch.go already kicks for the cheap protocol violations (BlobHashCheckFail,UnequalBlobTxExt) viacheckBlobSidecar, and the remainingvalidateTxnsreasons (NonceTooLow,InsufficientFunds,UnderPriced, fork-skew, …) aren't protocol violations worth kicking for.Test plan
TestValidateTxnsBlobKZGShortCircuit— drivesvalidateTxnsdirectly with a 2-txn batch where the first txn's blob byte is mutated post-proof (mirrors advisory PoC: cheap commit→versioned-hash check passes, KZG pairing fails). Remote subtest:goodTxnsempty. Local subtest: valid txn[1] reachesgoodTxns.TestProcessRemoteTxnsKicksKZGOffender— end-to-end viaAddRemoteTxnsFromPeer+processRemoteTxnsagainst aMockSentryServer; assertsPenalizePeer(Kick)called once with the attacker'sPeerId.goodTxnsnon-empty;kickKZGOffenderscall removed → gomock reports missingPenalizePeercall).TestPenalizePeerForMalformedMessages,TestDropRemoteAtNoGossip,TestBlobSlots,TestFetch.make lintclean.🤖 Generated with Claude Code