(feat): relay improvements#2205
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
There was a problem hiding this comment.
Pull request overview
This PR improves the preconf/private-tx relay by making private-tx store cleanup bounded (TTL + txpool-aware eviction), adding aggregated BP rejection/error reporting, and refining metrics for relay↔BP interactions.
Changes:
- Add periodic sweeping to
PrivateTxStorewith a hard TTL and optional txpool-presence-based eviction (wired viaSetTxPoolCheckerfrometh/backend.go). - Introduce an in-memory
rejectionTrackerand a background reporter to periodically log an aggregated rejection/error distribution. - Rename/expand relay metrics to provide more granular latency/success/failure tracking for preconf/private-tx/checkstatus RPC calls.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
eth/relay/service.go |
Renames/organizes relay timers and meters under relay/... metric paths. |
eth/relay/relay.go |
Adds wiring to inject a txpool checker into the private-tx store; expands docs around private-tx purge behavior. |
eth/relay/rejection_tracker.go |
Adds the rejection aggregation + formatting utilities (new file). |
eth/relay/private_tx_store.go |
Adds sweep loop (TTL + txpool-aware eviction), new gauge + metric, and txpool checker plumbing. |
eth/relay/private_tx_store_test.go |
Adds tests for TTL eviction, txpool-check eviction, and setter wiring. |
eth/relay/multiclient.go |
Splits RPC metrics by operation, adds rejection tracking + periodic reporting goroutine, adds retry success metric. |
eth/relay/multiclient_test.go |
Adds unit tests for rejection tracker behavior and verifies submission paths wire into the tracker. |
eth/backend.go |
Wires eth.txPool.Has into the relay’s private-tx store for sweep-based cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude review this PR. Check for any breaking / risky changes from an implementation / usage point of view. |
There was a problem hiding this comment.
An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.
Once the cap resets or is raised, comment @claude review on this pull request to trigger a review.
|
Claude finished @manav2401's task in 2m 4s —— View job PR Review: Relay Improvements
SummarySolid improvement to private tx lifecycle management. The sweep mechanism with TTL + txpool-awareness directly addresses the unbounded growth problem. The rejection tracker is well-designed with proper cardinality capping. Good test coverage across all new code paths. I found one concurrency issue worth fixing before merge, a few items worth discussing, and a handful of minor observations. Findings1. [HIGH]
|
| Change | Breaking? | Risk Level | Notes |
|---|---|---|---|
| Metric names renamed | Yes (operational) | Low | Dashboards need updating |
totalPrivateTxsMeter (Meter) → privateTxStoreSizeGauge (Gauge) |
Yes (operational) | Low | Type change — queries expecting rate-based metric will break |
New sweep goroutine in NewPrivateTxStore |
No | Medium | See finding #1 re: lock holding |
New rejectionTracker + reporter goroutine |
No | Low | Clean lifecycle management |
SetTxPoolChecker new API on RelayService |
No | Low | Additive, well-guarded |
| Private tx no longer purged on BP rejection | Behavioral | Low | Intentional — well-documented rationale in comments |
multiClient.close() now idempotent via closeOnce |
No | Positive | Fixes potential double-close panic |
Overall: No breaking API changes. The behavioral change (not purging on rejection, relying on TTL/sweep instead) is well-reasoned and properly documented. The main item to fix before merge is the lock-holding pattern in sweepOnce.
Code review1 issue found. Checked for bugs and CLAUDE.md compliance. Bug in
This contradicts the The tests in The same ordering issue exists at these two other locations: Suggested fix (for the first call site at lines 143-148): // If the tx is already known, treat it as preconfirmed for this node
if isAlreadyKnownError(err) {
preconfRpcAlreadyKnownMeter.Mark(1)
preconfOfferedCount.Add(1)
return
}
mc.rejectionTracker.record(err)Accepting the suggestion above fixes only this first call site. The same fix (moving |
|
@claude
|
|
Claude finished @manav2401's task in 45s —— View job Acknowledging Review Feedback
Thanks for the follow-up. Verified the current state: Finding #1 — sweepOnce lock: Confirmed fixed. The three-phase pattern (RLock snapshot → filter without lock → Lock to delete) at Finding #2 — "already known" in rejection tracker: Understood, intentionally tracked. The tests at lines 1558-1604 now explicitly assert Findings #3-#6: Acknowledged as intentional. LGTM — no remaining concerns. |
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (88.27%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## master #2205 +/- ##
==========================================
+ Coverage 52.21% 52.23% +0.01%
==========================================
Files 884 885 +1
Lines 155479 155616 +137
==========================================
+ Hits 81190 81281 +91
- Misses 69060 69099 +39
- Partials 5229 5236 +7
... and 19 files with indirect coverage changes
🚀 New features to boost your workflow:
|
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| log.Info("[tx-relay] BP rejection summary", | ||
| "total", total, | ||
| "errors", formatRejectionCounts(counts), | ||
| ) |
There was a problem hiding this comment.
reportRejections logs an Info-level "BP rejection summary" on every rejectionReportInterval tick, even when total is 0 and counts is nil/empty. This will add steady log noise on healthy nodes. Consider logging only when total > 0 (or when the formatted error string is non-empty), or lowering this to Debug level.
| log.Info("[tx-relay] BP rejection summary", | |
| "total", total, | |
| "errors", formatRejectionCounts(counts), | |
| ) | |
| if total > 0 { | |
| log.Info("[tx-relay] BP rejection summary", | |
| "total", total, | |
| "errors", formatRejectionCounts(counts), | |
| ) | |
| } |
| now := time.Now() | ||
| toDelete := make([]entry, 0) | ||
| for _, entry := range entries { | ||
| age := now.Sub(entry.addedAt) | ||
| if age > privateTxGracePeriod && s.txPoolChecker != nil && !s.txPoolChecker(entry.hash) { | ||
| toDelete = append(toDelete, entry) | ||
| } |
There was a problem hiding this comment.
s.txPoolChecker is read and invoked without holding any lock in sweepOnce(). Since SetTxPoolChecker writes the function pointer under s.mu, this can race (and can even panic if the checker becomes nil between the nil-check and the call). Capture txPoolChecker into a local variable under RLock (e.g., alongside the entries snapshot), then use that local for the rest of the sweep pass.
| txsAdded atomic.Uint64 | ||
| txsPurged atomic.Uint64 // deleted by an explicit call | ||
| txsDeleted atomic.Uint64 // deleted because tx got included | ||
| txsExpired atomic.Uint64 // deleted by sweep (txpool eviction or TTL) |
There was a problem hiding this comment.
The txsExpired metric comment says entries are deleted by sweep due to "txpool eviction or TTL", but the sweep implementation only deletes based on txpool absence after the grace period (no TTL-based expiry is implemented here). Please update the comment (or implement the TTL path if that was intended) to avoid misleading future readers.
| txsExpired atomic.Uint64 // deleted by sweep (txpool eviction or TTL) | |
| txsExpired atomic.Uint64 // deleted by sweep after txpool absence following the grace period |
| // logged but as they can't be fixed immediately by the relay, they'll be pruned | ||
| // after the default private tx store timeout. | ||
| // - Transient rejection due to reorg or nonce race. The relay's local pool will | ||
| // evict the tx eventually once it catches up with the txpool-aware sweep. | ||
| // - Transient rejection due to BPs current mempool state (e.g. pool full). This | ||
| // will be handled by local re-submissions in BPs. If the tx still remains invalid | ||
| // it will be evicted after the default private tx store timeout. |
There was a problem hiding this comment.
The doc comment mentions pruning "after the default private tx store timeout", but PrivateTxStore currently doesn’t have a time-based TTL eviction—only block-inclusion cleanup, explicit purge, and the txpool-aware sweep. Please either point to the actual eviction mechanism/constant, or reword this to match the implemented behavior.
| // logged but as they can't be fixed immediately by the relay, they'll be pruned | |
| // after the default private tx store timeout. | |
| // - Transient rejection due to reorg or nonce race. The relay's local pool will | |
| // evict the tx eventually once it catches up with the txpool-aware sweep. | |
| // - Transient rejection due to BPs current mempool state (e.g. pool full). This | |
| // will be handled by local re-submissions in BPs. If the tx still remains invalid | |
| // it will be evicted after the default private tx store timeout. | |
| // logged; the relay does not immediately purge them from the private tx store, | |
| // and they remain until removed by the store's existing cleanup paths. | |
| // - Transient rejection due to reorg or nonce race. The relay's local pool will | |
| // evict the tx eventually once it catches up with the txpool-aware sweep. | |
| // - Transient rejection due to BPs current mempool state (e.g. pool full). This | |
| // will be handled by local re-submissions in BPs; if the tx later drops out of | |
| // the relay's local pool, it can then be removed by the txpool-aware sweep. |
| // All BPs accepted the tx but at least one of them didn't offer a preconf | ||
| if firstErr == nil { | ||
| belowThresholdPreconfMeter.Mark(1) | ||
| } |
There was a problem hiding this comment.
In checkTxStatus, the comment says "All BPs accepted the tx" / "didn't offer a preconf", but this method is issuing txpool_txStatus queries (not submitting a tx for acceptance). Please adjust the comment to describe the actual condition being measured (e.g., all BPs report pending vs mixed/unknown).



Description
This PR does some improvements in the preconf / private tx relay.
Changes
Breaking changes
Please complete this section if any breaking changes have been made, otherwise delete it
Nodes audience
In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)
Checklist
Cross repository changes
Testing
Manual tests
Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it
Additional comments
Please post additional comments in this section if you have them, otherwise delete it