consensus/bor, internal/cli: full grpc implementation by marcello33 · Pull Request #2194 · 0xPolygon/bor

marcello33 · 2026-04-23T14:07:22Z

Summary

In this PR we implement the server-side half of the heimdall-bor gRPC migration started by polyproto's GetBlockInfoInBatch, GetTdByHash, GetTdByNumber, GetAuthor messages (see this PR). It provides:

Results

All 7 methods are faster or at parity on gRPC.
Biggest win: GetBorChainBlockInfoInBatch with 77% latency reduction — the call used in the consensus-critical ExtendVoteHandler path.
50–67% fewer bytes allocated per RPC across the board — less GC pressure.

Latency (ns per op)

Method                          HTTP ns/op    gRPC ns/op    Improvement
-------------------------------------------------------------------------
GetBorChainBlock                  584 873       406 353       -30.5%
GetBorChainBlockAuthor            476 365       417 626       -12.3%
GetBorChainBlockTd                466 027       404 187       -13.3%
GetBorChainBlockInfoInBatch     7 819 395     1 771 033       -77.3%
GetRootHash                       452 269       482 345        +6.7%
CheckIfBlocksExist                540 378       416 757       -22.9%
GetBlockByNumber                  644 355       430 615       -33.2%

Allocations (bytes and count per op)

Method                        HTTP B/op    gRPC B/op      Δ       HTTP allocs  gRPC allocs
-------------------------------------------------------------------------------------------
GetBorChainBlock                22 020        9 197    -58.2%          143          129
GetBorChainBlockAuthor          12 083        5 827    -51.8%          100           98
GetBorChainBlockTd              13 040        5 843    -55.2%          114           96
GetBorChainBlockInfoInBatch  1 164 435      386 226    -66.8%         6615         3473
GetRootHash                     12 171        5 811    -52.3%           98           95
CheckIfBlocksExist              22 088        9 205    -58.3%          147          131
GetBlockByNumber                24 853        9 596    -61.4%          163          132

Ambient system-level resource usage (per phase, 5 s interval across 1000 blocks)

Metric                       HTTP phase    gRPC phase       Δ
-------------------------------------------------------------
bor CPU avg                      2.60%         2.77%      +6.5%
bor CPU p95                      5.75%         7.18%     +24.9%
bor mem avg                  107.3 MiB     127.9 MiB     +19.2%
bor mem p95                  343.4 MiB     346.7 MiB      ~flat

heimdall CPU avg                 8.03%         7.94%      -1.1%
heimdall CPU p95                10.75%        11.30%      +5.1%
heimdall mem avg             121.5 MiB     127.9 MiB      +5.3%
heimdall mem p95             148.1 MiB     150.9 MiB      +1.9%

samples per phase (n=)           592           576

CPU samples during bench window (pprof, 120 s each, aligned to bench run)

Service                HTTP phase         gRPC phase          Δ
------------------------------------------------------------------
bor total CPU          35.25 s (29.4%)    34.93 s (29.1%)    -0.9%
heimdall total CPU      5.41 s  (4.5%)     6.34 s  (5.3%)   +17.2%

Heimdall CPU is ~17% higher on gRPC — consistent with the client doing proto decode per call instead of JSON-RPC decode.

Details

New gRPC handlers: GetBlockInfoInBatch, GetTdByHash, GetTdByNumber, GetAuthor.
Full proto Header fidelity: headerToProtoBorHeader now emits every 21 header fields.
Bearer-token authentication on the gRPC server: --grpc.token flag (or BOR_GRPC_TOKEN env var) + unary/stream interceptors. If the token is empty, auth is disabled. This matches the RFC 6750 §2.1 case-insensitive scheme.
Bor.APIs() caches a single *API instance via sync.Once, so the gRPC backend shares the same rootHashCache as the JSON-RPC registration path. Without this, I detected a 2.1× GetRootHash regression.
GetBlockInfoInBatch safety: size gate (≤ 256 blocks), end > MaxInt64 guard, count-based loop (wrap-safe), per-iteration ctx.Err() check. fetchBlockInfo returns (info, ok, err) so real failures propagate instead of silently truncating batches.
Proper gRPC status codes on every handler: InvalidArgument for malformed requests, NotFound for missing blocks/TD, OutOfRange for numeric-overflow cases. Callers can branch on canonical codes instead of everything collapsing to codes.Unknown.

What this enables on the heimdall side

Every heimdall-facing RPC now has a gRPC path (previously only GetRootHash, GetVoteOnHash, HeaderByNumber, BlockByNumber, TransactionReceipt, BorBlockReceipt).
Milestone proposition generation in heimdall (x/milestone/abci) can issue a single GetBlockInfoInBatch round-trip instead of 3×N JSON-RPC batch elements — up to 4.4× faster per call and 66% fewer bytes allocated per request on the kurtosis devnet benchmark.
Token-gated deployments are supported for same-host validator pairs; loopback-only safety check on the heimdall client refuses to transmit a token over plaintext to any non-loopback peer.

Info about the gRPC authentication

gRPC authentication via bearer token is not strictly needed when bor and heimdall run on the same host with bor bound to 127.0.0.1:3131 (the post-PR-#2078 default). This is because:

Loopback is the access control. Only processes on the same host (and in the same network namespace, so same container / same k8s pod) can reach 127.0.0.1:3131. Theoretically, remote attackers can't touch it.
Bor's gRPC exposes only read-only, public chain data (nothing private, no signing keys, no state-changing methods). Even an unauthenticated caller can only read things that are already public via the JSON-RPC endpoint, on-chain data, or a block explorer.
Bor accepts empty token to mean "auth disabled" — consistent with how bor treats loopback for TLS (plaintext is allowed on loopback without token).

The token becomes meaningful in three cases:

Cross-host: bor on a separate box (0.0.0.0:3131 + https://). The token is mandatory here, and heimdall refuses to transmit it over plaintext to a non-loopback peer.
Shared-host with multiple untrusted tenants: another process on the same machine could dial the loopback port. Rare for a validator box, but auth closes the gap.
Defense-in-depth against misconfiguration: if an operator later flips bor to 0.0.0.0 without noticing, a configured token prevents the port from being trivially open to the network.

Enabling it

gRPC is opt-in. Default config has bor listening on 127.0.0.1:3131 and heimdall dialing over HTTP JSON-RPC — no behavior change for existing operators.

Step 1: Bor side

Upgrade bor to a version that includes this PR and PR security: harden gRPC server — disable by default, bind to localhost #2078 (the upstream [grpc] loopback default). Required — heimdall's clients assume both.

Edit config.toml to opt into gRPC on the address heimdall can reach:

[grpc]
addr = "127.0.0.1:3131"    # same-host validator pair; no TLS required
# addr = "0.0.0.0:3131"    # cross-host; must pair with TLS / operator firewall
token = "the_token"         # prefer setting this via BOR_GRPC_TOKEN env var instead (see below)

For authenticated deployments, prefer setting the token via env var (cli leaks into ps aux / shell history):
```
export BOR_GRPC_TOKEN="$(openssl rand -hex 32)"
```
bor server reads it at startup; the flag stays as a fallback.

Step 2: Heimdall side

Flip the config in app.toml:

bor_grpc_flag = "true"
bor_grpc_url  = "http://127.0.0.1:3131"   # same-host
# bor_grpc_url = "https://bor.example.net:3131"  # TLS cross-host
bor_grpc_token = "<match-bor-side>"        # empty if bor has empty token

Restart heimdall; the startup parity check runs one HeaderByNumber over both transports and logs a Warning / Fatal if the hashes diverge (guard against a stale bor that doesn't populate the full proto header).

Rollout recommendation

Validators: same-host bor+heimdall, addr=127.0.0.1:3131, token empty or set via BOR_GRPC_TOKEN. No TLS needed on loopback.
Cross-host (e.g., sentry + shared bor): require addr=0.0.0.0:3131 + https:// URL + matching token. Heimdall refuses to send a token over plaintext to a non-loopback peer.
Bridge nodes / RPC providers: can stay on the HTTP path indefinitely. The flag only affects the heimdall→bor direction.

Downgrade path

Set bor_grpc_flag = "false" in heimdall and restart. Bor doesn't need to be downgraded — it keeps serving both transports regardless of the flag.

Tests

Extensively unit-tested.
Ran on a kurtosis-pos devnet by switching on-off the gRPC flag. Waited for 1000-block window each: producer rotation, milestone formation, state sync — all match between transports.
HTTP vs gRPC benchmark against the same devnet: GetBlockInfoInBatch 4.4× faster on gRPC, 50–67% allocation reduction across all methods, GetRootHash regression closed via the cached API.

Dependencies

Requires heimdall PR fix : TestRemoteMultiNotifyFull #576 for the client-side counterpart. Neither side is functional alone — ship together or bor-first (bor stays backward-compatible with HTTP-only heimdall).
Requires polyproto ≥ v0.0.8 (see this PR) for the new GetBlockInfoInBatch / GetTdByHash / GetTdByNumber / GetAuthor RPCs and the expanded Header message.
Requires kurtosis-pos changes (small: expose bor's [grpc] port, thread cl_bor_grpc_flag + el_grpc_url through the CL launcher template) if you want to run the end-to-end bench on a devnet. The changes are here

Out of scope

Receipt Bloom and Logs are intentionally not populated yet. No heimdall consumer reads them today; will be a small follow-up when needed.
GetTdByNumber tag resolution (latest/pending) via HeaderByNumber → GetTd(hash) is scoped to this gRPC handler; the same defect in upstream eth/api_backend.go:GetTdByNumber affects the JSON-RPC path and is tracked separately.

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

marcello33 · 2026-04-23T14:08:08Z

@claude review.

claude · 2026-04-23T14:08:27Z

Claude finished @marcello33's task in 3m 47s —— View job

Code Review: gRPC Implementation

Gather context and understand changes
Review gRPC server and auth changes (internal/cli/server/)
Review consensus changes (consensus/bor/)
Review configuration and flag changes
Review test files
Review dependency changes
Post final review feedback

Overall this is a solid WIP. The auth interceptor, API caching, and header expansion are well-designed. I found one security-sensitive bug and a few smaller items.

Finding 1 — uint64 overflow in `GetBlockInfoInBatch` loop (HIGH)

File: internal/cli/server/api_service.go:212

for i := req.StartBlockNumber; i <= req.EndBlockNumber; i++ {

When EndBlockNumber == math.MaxUint64, i++ wraps to 0 after the last iteration, and 0 <= MaxUint64 is true — the loop restarts from block 0 and scans the entire chain.

The batch-size check at line 201 doesn't catch this. Example: start = MaxUint64 - 100, end = MaxUint64 passes validation (100 < 256) but the loop wraps.

This is RPC-triggerable (gRPC endpoint). Each iteration does 3 backend calls (header, TD, author), so the loop can burn significant resources before hitting a missing block.

Suggested fix — use a count-based loop:

count := req.EndBlockNumber - req.StartBlockNumber + 1
for j := uint64(0); j < count; j++ {
    info, ok := s.fetchBlockInfo(ctx, req.StartBlockNumber+j)
    if !ok {
        break
    }
    out.Blocks = append(out.Blocks, info)
}

Fix this →

Note: the existing test TestGetBlockInfoInBatch_RangeOverflow tests start=0, end=MaxUint64 which is caught by the batch-size check. You'd need a test for start=MaxUint64-100, end=MaxUint64 to cover the wrap-around case.

Finding 2 — gRPC auth token visible in process listing (MEDIUM)

File: internal/cli/server/flags.go:1229-1234

The --grpc.token CLI flag means the token appears in ps aux, /proc/<pid>/cmdline, and shell history. Consider supporting an environment variable (e.g., BOR_GRPC_TOKEN) or file-path-based token reading (--grpc.token-file) as the primary mechanism, with the flag as a fallback.

Finding 3 — Shared `*uint64` pointers in `headerToProtoBorHeader` (LOW)

File: internal/cli/server/api_service.go:67-68

out.BlobGasUsed = h.BlobGasUsed
out.ExcessBlobGas = h.ExcessBlobGas

These share the pointer from the source header to the proto output. After proto serialization (the production path) this is harmless, but for any in-process usage the proto message and the types.Header alias the same *uint64. A defensive copy would be consistent with how ExtraData is handled on line 52 (append([]byte(nil), h.Extra...)):

if h.BlobGasUsed != nil {
    v := *h.BlobGasUsed
    out.BlobGasUsed = &v
}

Finding 4 — Missing context cancellation check in batch loop (LOW)

File: internal/cli/server/api_service.go:212-221

The loop doesn't check ctx.Done(). With up to 256 iterations, a cancelled context keeps calling backend methods until they individually fail or all blocks are fetched. A select on ctx.Done() at the top of the loop would give faster cancellation:

select {
case <-ctx.Done():
    return nil, ctx.Err()
default:
}

Finding 5 — Redundant hash conversion in `GetTdByHash` (LOW)

File: internal/cli/server/api_service.go:169-170

hashBytes := protoutil.ConvertH256ToHash(req.Hash)
hash := common.BytesToHash(hashBytes[:])

ConvertH256ToHash already returns a [32]byte which can be assigned directly to common.Hash (same underlying type). The second conversion is redundant.

Finding 6 — `data.csv` row reordering (INFO)

File: eth/tracers/data.csv

The diff shows pure row reordering within each transaction group (same data, different order). This looks like non-deterministic map iteration in the test data generator. Worth verifying this isn't a concurrency issue — if this file is committed to the repo as test fixture, non-deterministic ordering causes noisy diffs.

Positive observations

Auth interceptor: Good use of subtle.ConstantTimeCompare. Properly gates both unary and stream RPCs. Error messages are appropriately vague ("invalid token" vs leaking details).
API caching via sync.Once (consensus/bor/bor.go:1532-1535): Correctly solves the per-call cache invalidation issue. GetRootHash and GetVoteOnHash in eth/bor_api_backend.go call APIs() on every invocation, so without caching the rootHashCache LRU inside *API would be thrown away each time.
headerToProtoBorHeader expansion: Complete field mapping with proper nil guards for post-Shanghai/Cancun optional fields. The round-trip tests (including the zero-blob-gas edge case) are well-motivated.
Batch endpoint design: Sensible maxBlockInfoBatchSize limit, gap-stop semantics, and genesis-aware author handling.
Test coverage: Comprehensive unit tests for auth (all error paths + constant-time check), batch boundary conditions, overflow, and header round-trips across fork eras.

Copilot

Pull request overview

This WIP PR extends the gRPC implementation by adding bearer-token authentication and expanding the Bor gRPC API surface, while aligning Bor’s RPC API lifecycle with gRPC’s per-call access pattern.

Changes:

Add optional gRPC bearer-token auth + unified unary/stream interceptors (also gating reflection).
Expand BorApi gRPC handlers and improve header→proto conversion + batch endpoint size gating.
Cache Bor’s *API instance across APIs() calls to preserve per-API state; update deps/tests accordingly.

Reviewed changes

Copilot reviewed 11 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
internal/cli/server/server.go	Add combined unary/stream interceptors and metadata-based bearer auth.
internal/cli/server/config.go	Add `GRPC.Token` config field.
internal/cli/server/flags.go	Add `--grpc.token` CLI flag.
internal/cli/server/grpc_auth_test.go	Add unit tests for auth + interceptors.
internal/cli/server/api_service.go	Add/extend BorApi gRPC handlers, header conversion, and batch endpoint with size gate.
internal/cli/server/api_service_test.go	Add tests for header round-trip and batch-range guards.
consensus/bor/bor.go	Cache the Bor `*API` returned by `APIs()` across calls.
consensus/bor/bor_test.go	Add test asserting `APIs()` returns the same service instance across calls.
go.mod / go.sum	Bump `github.com/0xPolygon/polyproto` to a newer pseudo-version.
cmd/keeper/go.mod / cmd/keeper/go.sum	Bump indirect `golang.org/x/sys`.
eth/tracers/data.csv	Reorder/update CSV rows.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

claude · 2026-04-23T14:19:32Z

Code Review

Bug: uint64 overflow causes infinite loop in GetBlockInfoInBatch (DoS)

File: internal/cli/server/api_service.go L197-L218

The for loop for i := req.StartBlockNumber; i <= req.EndBlockNumber; i++ at line 212 will never terminate when EndBlockNumber == math.MaxUint64. After i reaches MaxUint64, i++ wraps to 0 (unsigned integer overflow), and 0 <= MaxUint64 is always true.

The size gate does not prevent this. For example, StartBlockNumber = MaxUint64, EndBlockNumber = MaxUint64 has a difference of 0, which passes the >= 256 check.

Why fetchBlockInfo doesn't save us: rpc.BlockNumber(math.MaxUint64) casts to int64(-1) which is PendingBlockNumber — a valid block on mining/validator nodes. After wrapping to 0, the loop fetches from genesis through every block in the chain, causing unbounded memory allocation and CPU consumption.

Suggested fix: Restructure the loop to count iterations instead of iterating on the block number directly:

count := req.EndBlockNumber - req.StartBlockNumber + 1
for j := uint64(0); j < count; j++ {
    info, ok := s.fetchBlockInfo(ctx, req.StartBlockNumber+j)
    if !ok {
        break
    }
    out.Blocks = append(out.Blocks, info)
}

This avoids the i <= MaxUint64 trap since j < count where count is bounded by the size gate.

Additional concern: fetchBlockInfo casts uint64 block numbers to rpc.BlockNumber (int64) without checking for overflow. Large uint64 values silently map to negative int64 values corresponding to special block numbers (-1 = Pending, -2 = Latest), which could return unexpected blocks.

codecov · 2026-04-23T14:37:03Z

Codecov Report

❌ Patch coverage is 77.30496% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 52.59%. Comparing base (3350958) to head (280ab3b).
⚠️ Report is 3 commits behind head on develop.

Files with missing lines	Patch %	Lines
internal/cli/server/api_service.go	72.65%	54 Missing and 16 partials ⚠️
internal/cli/server/server.go	84.94%	11 Missing and 3 partials ⚠️
internal/cli/server/command.go	30.00%	6 Missing and 1 partial ⚠️
consensus/bor/bor.go	44.44%	4 Missing and 1 partial ⚠️

❌ Your patch status has failed because the patch coverage (77.30%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #2194      +/-   ##
===========================================
+ Coverage    52.33%   52.59%   +0.25%     
===========================================
  Files          885      885              
  Lines       155904   156286     +382     
===========================================
+ Hits         81587    82192     +605     
+ Misses       69080    68835     -245     
- Partials      5237     5259      +22

Files with missing lines	Coverage Δ
consensus/bor/api.go	`86.12% <ø> (ø)`
internal/cli/command.go	`20.07% <100.00%> (+8.91%)`	⬆️
internal/cli/server/config.go	`63.87% <ø> (ø)`
internal/cli/server/flags.go	`100.00% <100.00%> (ø)`
internal/cli/server/service.go	`13.52% <ø> (ø)`
internal/cli/server/utils.go	`100.00% <100.00%> (+100.00%)`	⬆️
consensus/bor/bor.go	`85.04% <44.44%> (-0.30%)`	⬇️
internal/cli/server/command.go	`37.70% <30.00%> (-0.45%)`	⬇️
internal/cli/server/server.go	`57.14% <84.94%> (+26.61%)`	⬆️
internal/cli/server/api_service.go	`72.92% <72.65%> (+72.92%)`	⬆️

... and 28 files with indirect coverage changes

Files with missing lines	Coverage Δ
consensus/bor/api.go	`86.12% <ø> (ø)`
internal/cli/command.go	`20.07% <100.00%> (+8.91%)`	⬆️
internal/cli/server/config.go	`63.87% <ø> (ø)`
internal/cli/server/flags.go	`100.00% <100.00%> (ø)`
internal/cli/server/service.go	`13.52% <ø> (ø)`
internal/cli/server/utils.go	`100.00% <100.00%> (+100.00%)`	⬆️
consensus/bor/bor.go	`85.04% <44.44%> (-0.30%)`	⬇️
internal/cli/server/command.go	`37.70% <30.00%> (-0.45%)`	⬇️
internal/cli/server/server.go	`57.14% <84.94%> (+26.61%)`	⬆️
internal/cli/server/api_service.go	`72.92% <72.65%> (+72.92%)`	⬆️

... and 28 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

marcello33 · 2026-04-23T16:21:41Z

@claude addressed your points. Review again.

claude · 2026-04-23T16:22:11Z

Claude finished @marcello33's task in 2m 4s —— View job

Re-review: gRPC Implementation (post `fe62ec8`)

Gather latest changes and compare against previous findings
Review fix for uint64 overflow in GetBlockInfoInBatch (Finding 1)
Review fix for shared pointer / slice aliasing issues (Finding 3 + Copilot)
Review fix for fetchBlockInfo uint64→int64 cast (Copilot)
Review fix for flag help text (Copilot)
Review fix for initializeRootHashCache race (Copilot)
Review fix for genesis author comment (Finding from claude review)
Check for any remaining issues
Post final review feedback

All the review comments from the first round have been addressed in fe62ec8. Here's how each fix landed:

Verified fixes

1. uint64 overflow in GetBlockInfoInBatch — Fixed (api_service.go:204-234)

The loop is now count-based (for j := uint64(0); j < count; j++) which cannot wrap. An explicit req.EndBlockNumber > math.MaxInt64 guard at line 211 rejects near-MaxUint64 ranges before they reach the loop. A new test TestGetBlockInfoInBatch_NearMaxUint64 covers the previously-missing boundary (MaxUint64-3, MaxUint64). Context cancellation check (ctx.Err()) was also added inside the loop. All good.

2. Shared *uint64 pointers — Fixed (api_service.go:69-76)

BlobGasUsed and ExcessBlobGas are now copied through a fresh variable, consistent with the rest of the conversion.

3. Bloom/Nonce slice aliasing — Fixed (api_service.go:49,54)

Both now use append([]byte(nil), ...) to create independent copies.

4. fetchBlockInfo int64 overflow guard — Fixed (api_service.go:242-244)

Added if blockNum > math.MaxInt64 { return nil, false } before the cast. Belt-and-suspenders with the GetBlockInfoInBatch guard.

5. Flag help text — Fixed (flags.go:1229)

Clear explanation of Bearer prefix stripping and recommendation to use BOR_GRPC_TOKEN env var.

6. BOR_GRPC_TOKEN env var — Added (command.go:137-142)

Good fallback that addresses the token-in-process-listing concern from Finding 2.

7. initializeRootHashCache race — Fixed (bor.go:1537-1542)

Cache is now initialized eagerly inside apiOnce.Do before c.api is assigned.

8. Genesis author comment — Fixed (api_service.go:239-240)

Comment now accurately says "nil *H160" and instructs callers to nil-check.

9. Redundant hash conversion in GetTdByHash — Fixed (api_service.go:177)

Now uses a direct type conversion: common.Hash(protoutil.ConvertH256ToHash(req.Hash)).

New observations on `fe62ec8`

1. panic in APIs() on cache init failure — Worth discussing (consensus/bor/bor.go:1539-1541)

if err := a.initializeRootHashCache(); err != nil {
    panic(fmt.Errorf("bor: failed to initialize rootHashCache: %w", err))
}

The initializeRootHashCache method creates an ARC LRU cache of size 10 — in practice, lru.NewARC(10) only fails if size <= 0, which can't happen here with a literal 10. So this panic is unreachable, and the defensive choice is fine. However, if the caching library ever changes failure modes, a panic in APIs() (called during node startup) would crash the node with a stack trace instead of a clean error message. Since APIs() returns []rpc.API with no error channel, this is about the only option — just noting it for awareness.

2. initializeRootHashCache is now called in two places (bor.go:1539 and api.go:282)

The eager call in apiOnce.Do guarantees the cache exists before any handler runs. The existing call in GetRootHash (api.go:282) is now redundant — it will always find rootHashCache != nil and be a no-op. Not a bug (the nil-check makes it idempotent), but you could clean it up in a follow-up to avoid confusion about which call actually initializes the cache.

3. getRpcBlockNumberFromString overflow check — Good addition (api_service.go:290-292)

The blckNum > math.MaxInt64 guard in the string parser ensures HeaderByNumber, GetAuthor, and GetTdByNumber (all of which take string block numbers) are also protected from the uint64→int64 reinterpretation. This was not flagged in the first review but is a nice improvement.

Summary

All HIGH and MEDIUM findings from the first review are resolved. The code is in good shape — the uint64 overflow was the most important fix and it's landed correctly with both the count-based loop and the MaxInt64 boundary check. The remaining items above are minor observations, not blockers.

View job

Copilot

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

marcello33 · 2026-04-23T17:25:01Z

@claude addressed your points from the previous run.
Review once more.

claude · 2026-04-23T17:25:23Z

Claude finished @marcello33's task in 1m 53s —— View job

Re-review: gRPC Implementation (3rd pass — post `edafc70`)

Gather latest changes since last review (fe62ec8 → edafc70)
Review fixes for nil guard on GetTdByHash (claude finding)
Review fixes for fetchBlockInfo error handling (claude finding)
Review fixes for GetTdByNumber tag resolution (claude finding)
Review fixes for gRPC status codes (copilot finding)
Review fixes for auth interceptor issues (copilot findings)
Check for any new or remaining issues
Post final review feedback

All findings from the second review round have been addressed in edafc70. Here's the verification:

Verified fixes

1. Nil guard on GetTdByHash / TransactionReceipt / BorBlockReceipt — Fixed (api_service.go:22-27, 138-142, 161-165, 196-200)

A new protoHashToCommon helper centralizes the nil check for H256 and its Hi/Lo sub-messages, returning codes.InvalidArgument on malformed input. All three hash-consuming handlers now use it. This matches the guard the test helper already had (convH at line 224-230) and closes the panic-on-nil-hash path cleanly. Good factoring — one guard, three call sites.

2. fetchBlockInfo error handling — Fixed (api_service.go:272-316)

The function now returns (info, bool, error) with clear three-way semantics documented in the comment:

(info, true, nil) → success
(nil, false, nil) → legitimate gap (header missing / TD not indexed)
(nil, false, err) → real failure (backend error, ecrecover, overflow)

The critical fix: Author() failure is now propagated as an error instead of being silently swallowed as a gap. GetBlockInfoInBatch correctly short-circuits on real errors at line 257-259. Well done.

3. GetTdByNumber tag resolution — Fixed (api_service.go:212-233)

The handler now resolves the block number tag to a concrete header via HeaderByNumber first, then calls GetTd(ctx, header.Hash()) — avoiding the pre-existing upstream bug where GetTdByNumber passed the raw negative tag cast to uint64 as the block number key. This makes the gRPC endpoint deterministic regardless of TD cache state.

4. initializeRootHashCache dual-call documentation — Addressed (consensus/bor/api.go:280-282)

The GetRootHash comment now explains why the lazy init is kept: "fallback for direct-API paths (e.g., tests) that don't go through Bor.APIs." Good — this makes the intentional redundancy explicit.

Remaining observations (non-blocking)

1. GetBlockInfoInBatch validation errors use plain errors.New instead of status.Error(codes.InvalidArgument, ...) — LOW

File: api_service.go:238-245

The three validation checks (end < start, exceeds max batch size, exceeds max int64) return errors.New(...). In gRPC, plain errors become codes.Unknown on the wire. The fetchBlockInfo function already correctly uses status.Error(codes.InvalidArgument, ...) at line 281, and protoHashToCommon does the same at line 24. Making the validation errors consistent would let clients distinguish bad input from internal failures. Copilot flagged this in the latest review as well.

Fix this →

2. Auth interceptor: rejected requests are not logged — LOW

File: server.go:466-469

The comment says the interceptor "always logs the request duration", but when auth fails the function returns before start := time.Now() and the log.Trace call (line 471-473). Rejected auth attempts are therefore invisible in logs. This was flagged by Copilot in the latest round. Not a bug, but a monitoring gap — consider either logging rejected attempts (at a different level/message) or updating the comment.

3. Case-sensitive Bearer prefix check — INFO

File: server.go:508

Per RFC 6750, the auth scheme name is case-insensitive (bearer, BEARER, Bearer should all work). The current strings.HasPrefix(h, "Bearer ") rejects non-canonical casing. For an internal gRPC endpoint consumed by heimdall (which you control), this is fine — just document that only Bearer (capitalized) is accepted, or switch to strings.EqualFold on the prefix if broader client compatibility is desired.

4. subtle.ConstantTimeCompare timing note — INFO

File: server.go:512

The function comment says "constant-time comparison", which is accurate for equal-length inputs. subtle.ConstantTimeCompare does return immediately when lengths differ. Since the token is compared after stripping the Bearer prefix, an attacker could probe for the token length via timing. In practice this is irrelevant for a gRPC token on a local/internal endpoint, but if you want full constant-time guarantees, compare sha256.Sum256 of both sides instead.

Summary

The PR is in good shape. All HIGH and MEDIUM findings from the first two review rounds are resolved. The fetchBlockInfo three-way return is well-designed, the protoHashToCommon helper is a clean factoring, and the GetTdByNumber tag resolution fix sidesteps a real upstream bug. The remaining items are minor consistency and documentation nits — none are blockers.

View job

marcello33 · 2026-04-23T18:28:55Z

@claude addressed latest comments. Review.

claude · 2026-04-23T18:29:14Z

Claude finished @marcello33's task in 1m 16s —— View job

Re-review: gRPC Implementation (4th pass — post `43b6efc`)

Gather latest changes since last review (edafc70 → 43b6efc)
Verify fix for GetBlockInfoInBatch validation errors (gRPC status codes)
Verify fix for auth interceptor logging on rejected requests
Verify fix for subtle.ConstantTimeCompare timing concerns
Verify fix for case-insensitive Bearer prefix
Check for any new or remaining issues
Post final review feedback

All findings from the 3rd review round have been addressed in 43b6efc. Here's the verification:

Verified fixes

1. GetBlockInfoInBatch validation errors now use gRPC status codes — Fixed (api_service.go:239-247)

All three validation errors (end < start, exceeds max batch size, end exceeds max int64) now return status.Error(codes.InvalidArgument, ...) instead of plain errors.New(...). This is consistent with fetchBlockInfo (line 283) and protoHashToCommon (line 24). Clients can now reliably distinguish bad input from internal failures.

2. Auth interceptor now logs rejected requests — Fixed (server.go:470, 490)

Both combinedUnaryInterceptor and combinedStreamInterceptor now log at Debug level when auth fails, before returning the error. The comments have been updated to accurately describe the logging behavior (Debug for rejections, Trace for successful calls). Good distinction between the two levels.

3. subtle.ConstantTimeCompare comment updated — Fixed (server.go:501-505)

The function comment now honestly states that ConstantTimeCompare is constant-time for equal-length inputs and that length-mismatched inputs short-circuit. No overstated guarantees.

4. Case-insensitive Bearer prefix — Fixed (server.go:517)

strings.HasPrefix(h, "Bearer ") replaced with len(h) < len(prefix) || !strings.EqualFold(h[:len(prefix)], prefix). This correctly handles bearer, BEARER, and other casings per RFC 6750 §2.1 while still extracting the token bytes after the 7-character prefix.

Remaining observations (non-blocking)

1. No test for case-insensitive Bearer prefix — INFO

The auth tests all use "Bearer secret" (canonical casing). Since this was a notable behavioral change, a test case with "bearer secret" or "BEARER secret" would document the RFC 6750 compliance and prevent regression. One-liner addition to the existing test suite.

2. errors.New vs status.Error inconsistency in other handlers — INFO

The GetBlockInfoInBatch validation now correctly uses status.Error(codes.InvalidArgument, ...), but several other handlers still return errors.New(...) for user-facing errors:

HeaderByNumber line 109: errors.New("header not found") → would be codes.NotFound
BlockByNumber line 126: errors.New("block not found") → would be codes.NotFound
TransactionReceipt lines 151, 155: errors.New("no receipts found"), errors.New("transaction index out of bounds") → codes.NotFound / codes.InvalidArgument
GetTdByHash line 204: errors.New("total difficulty not found") → codes.NotFound
GetTdByNumber lines 224, 228: same pattern

These all surface as codes.Unknown on the wire today. Not a bug — heimdall likely treats any non-nil error as a failure — but converting them to proper gRPC status codes in a follow-up would make the API more standard. Low priority.

Summary

All findings from the 3rd review round are resolved. The PR is in good shape — the gRPC status codes, auth logging, RFC 6750 compliance, and timing-guarantee documentation are all properly landed. The only remaining items are minor consistency nits (test coverage for case-insensitive Bearer, and gRPC status codes in other handlers), neither of which are blockers. LGTM for the scope of this WIP.

View job

claude · 2026-04-23T18:38:42Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Copilot

Pull request overview

Copilot reviewed 13 out of 15 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

claude · 2026-05-08T13:38:17Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

claude

⚠️ Code review skipped — your organization has reached its monthly code review spending cap.

An organization admin can view or raise the cap at claude.ai/admin-settings/claude-code. The cap resets at the start of the next billing period.

Once the cap resets or is raised, reopen this pull request to trigger a review.

Copilot

Pull request overview

Copilot reviewed 23 out of 25 changed files in this pull request and generated 5 comments.

socket-security · 2026-05-18T06:41:48Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	golang/github.com/0xPolygon/heimdall-v2@v0.6.0 ⏵ v0.7.1	⁺¹
	golang/github.com/0xPolygon/polyproto@v0.0.7 ⏵ v0.0.8	⁺¹

View full report

sonarqubecloud · 2026-05-19T15:34:55Z

Quality Gate passed

Issues
4 New issues
4 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
1.6% Duplication on New Code

See analysis details on SonarQube Cloud

Copilot

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 1 comment.

* metrics: fix ResettingSample Prometheus _count monotonicity (#2174) * metrics: fix ResettingSample Prometheus _count monotonicity * use atomic count and remove cumulative sum in resettingSample * metrics: add Clear method for resetting sample and add more tests * metrics: avoid cloning slice --------- Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * ethapi: include pre-Madhugiri state-sync logs in bor_getLogs (#2155) * ethapi: include pre-Madhugiri state-sync logs in bor_getLogs * test: directly validate getBlockAndReceipts state-sync behavior * security: harden gRPC server — disable by default, bind to localhost (#2078) * security: harden gRPC server - disable by default, bind localhost - Add 'Enabled' field to GRPCConfig (default: false), requiring explicit --grpc.enabled flag to start the gRPC server - Change default bind address from 0.0.0.0:3131 to 127.0.0.1:3131 - Guard gRPC server startup with Enabled check in NewServer() - Add --grpc.enabled CLI flag in flags.go - Fix GetGrpcAddr() to use net.SplitHostPort instead of string slice - Update docs and test helper for compatibility The gRPC server currently starts unconditionally on all interfaces (0.0.0.0:3131) with no authentication, no TLS, and no way to disable it. This is inconsistent with HTTP-RPC and WS-RPC which are disabled by default. An attacker with network access can invoke sensitive RPCs including ChainSetHead (reorg), PeersAdd/Remove (eclipse attacks), and StatusBorStatus (reconnaissance) without credentials. * remove grpc.enabled flag, simplify changes * internal/cli/server: update default toml for tests --------- Co-authored-by: harshinsecurity <hi@harshinsecurity.in> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> * eth: stop dialing before waiting on handler shutdown (#2158) * core/state: fix GetWriteMapDump write-set output (#2156) * metrics, build: fix time.Tick resource leaks and oss-fuzz shebang (#2145) * metrics, build: fix time.Tick resource leaks and oss-fuzz shebang Two fixes inherited from upstream go-ethereum: 1. metrics: replace time.Tick() with time.NewTicker() + defer Stop() in 6 monitoring goroutines. time.Tick() leaks ticker resources (SA1015). Files: debug.go, json.go, log.go, opentsdb.go, syslog.go, writer.go. 2. oss-fuzz.sh: fix missing '!' in shebang (#/bin/sh → #!/bin/sh) of generated fuzz test runner scripts. Both fixes have been accepted upstream: - ethereum/go-ethereum#34044 (shebang, merged) - ethereum/go-ethereum#34046 (time.Tick, open) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: remove time.Tick changes per maintainer request Go 1.23+ garbage-collects unreferenced tickers created by time.Tick, making the NewTicker replacement unnecessary. Revert to stay close to upstream go-ethereum and avoid merge conflicts. The oss-fuzz.sh shebang fix is retained as it is independent. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * build: bump Go from 1.26.1 to 1.26.2 to fix stdlib CVEs (#2189) Addresses 6 vulnerabilities in the Go standard library fixed in go1.26.2: - GO-2026-4947: Unexpected work during chain building in crypto/x509 - GO-2026-4946: Inefficient policy validation in crypto/x509 - GO-2026-4866: Case-sensitive excludedSubtrees auth bypass in crypto/x509 - GO-2026-4870: Unauthenticated TLS 1.3 KeyUpdate DoS in crypto/tls - GO-2026-4869: Unbounded allocation for old GNU sparse in archive/tar - GO-2026-4865: JsBraceDepth XSS in html/template * triedb/pathdb: fix exponential memory growth in preload BFS (#2187) * triedb/pathdb: replace exponential BFS with trie-aware child decoding in preload The preload BFS previously called gatherChildPaths which always enqueued all 16 nibble children for every node regardless of node type. This caused the BFS queue and visited map to grow as O(16^depth), leading to visited maps of ~14 GB per goroutine for large storage tries (8 GB caches). On machines with limited RAM this caused severe swapping that stalled block import for the duration of the preload (~2 hours for the largest caches). Replace with decodeChildPaths which decodes the RLP-encoded trie node and returns only the actual children: - Branch nodes (17 elements): enqueue slots with non-nil children only - Extension nodes (2 elements, no terminator): enqueue the single child path using the decoded compact key nibbles - Leaf nodes (2 elements, with terminator): no children Because MPT child paths are strictly longer than the parent path, the traversal is cycle-free without a visited set. Remove the visited map entirely. Queue size is now O(trie width) rather than O(16^depth). Add compactKeyToNibbles helper and replace the gatherChildPaths test with comprehensive tests covering all node types, the round-trip compact encoding, and a cycle-free proof via a 5-node trie where all nodes must be loaded exactly once. * triedb/pathdb: address PR review feedback on biased cache preload - Change nil-child check from exact 0x80 match to len <= 1 for robustness against non-canonical RLP encodings (both branch and extension nodes) - Make decodeChildPaths a package-level function (receiver was unused) - Pre-allocate nibbles slice in compactKeyToNibbles to avoid reallocations - Replace time.Sleep with cache.wg.Wait() in two tests for deterministic sync * triedb/pathdb: fix legacy rate-limit tests to use valid RLP trie nodes TestAddressBiasedCache_RateLimitInterruption and ShutdownDuringRateLimitWait were silently passing without exercising their stated scenario: both used raw non-RLP bytes as the root node, causing decodeChildPaths to return nil immediately and the rate limiter to never block. Rewrite both tests to use valid branch/leaf RLP nodes so the BFS actually traverses children and exhausts the 64KB burst, making WaitN genuinely block before Close() is called. * build(deps): bump go.opentelemetry.io/otel/sdk from 1.40.0 to 1.43.0 (#2182) Bumps [go.opentelemetry.io/otel/sdk](https://github.com/open-telemetry/opentelemetry-go) from 1.40.0 to 1.43.0. - [Release notes](https://github.com/open-telemetry/opentelemetry-go/releases) - [Changelog](https://github.com/open-telemetry/opentelemetry-go/blob/main/CHANGELOG.md) - [Commits](https://github.com/open-telemetry/opentelemetry-go/compare/v1.40.0...v1.43.0) --- updated-dependencies: - dependency-name: go.opentelemetry.io/otel/sdk dependency-version: 1.43.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * eth/fetcher: refactor witness_manager to reduce complexity (#2188) * eth/fetcher: refactor witness_manager to reduce complexity and close test gaps Structural refactor of witness_manager.go plus targeted tests, guided by diffguard (https://github.com/0xPolygon/diffguard). Max cognitive complexity 44 → 10; Tier 1 / Tier 2 mutation score 100%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: add diffguard workflow for PR quality metrics Runs diffguard (https://github.com/0xPolygon/diffguard) on every PR to report cognitive complexity, function/file sizes, dependency structure, churn-weighted complexity, and mutation testing at 10% sample rate against the base branch. Text report goes to the step summary (visible in the PR check UI) and a JSON artifact is retained for 14 days for offline triage. Uses the 16-core bor runner to keep mutation turnaround bounded. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * eth/fetcher: fix inverted condition in armTimerChan The refactor introduced an inverted condition where armTimerChan returned a nil timer channel when pending requests existed, preventing tick() from ever firing through the timer path. The retry loop only appeared to work because rescheduleWitness also pokes pokeCh. Reported in PR #2188 review. Adds TestWitnessLoopDrivesFetchesForPending to guard against regressions — it exercises the full loop→tick→ fetchWitness pipeline via real channels, which the existing TestLoop did not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * eth/fetcher: fix distance check drift and test nil derefs Three issues from PR #2188 code review: - handleNeed's distance check was accidentally tightened from `dist < -maxUncleDist` to `<=` during the refactor, introducing a boundary mismatch with block_fetcher.go (which uses `<`). A block at exactly dist == -maxUncleDist would be accepted by block_fetcher but dropped by witness_manager, potentially importing blocks without their witnesses. Reverted to match block_fetcher. - TestWitnessHandleFilterResultSkipsAlreadyPending dereferenced `m.pending[hash].op.origin` inside the nil-check branch, panicking on the exact mutation it was meant to catch (pending entry removal). Now fails cleanly with t.Fatal when entry is nil. - Same nil deref pattern in TestWitnessCheckCompletingSkipsAlreadyPending. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * all: fix nightly race test flakes (#2193) The nightly-race workflow had been failing on every run for a month. Most failures were either race-detector overhead pushing test timings past narrow assertion margins, tests mutating shared global state that collided with parallel tests, or a single real data race that cascaded into many victim tests. internal/ethapi: testBackendWithPreMadhuguriBorReceipt.ChainConfig() did a shallow copy of params.AllEthashProtocolChanges, leaving cfg.Bor aliased to the global BorConfig, then mutated MadhugiriBlock on it. Concurrent tests calling IsMadhugiri raced with the write. Deep-copy BorConfig like the sibling testBackendWithNilBorTx already does. This single race was the root cause of the ~10 cascading ethapi test failures (TestBorForks, TestBorGetLogs_*, TestCoinbase, TestEstimateGas, etc.). consensus/bor/heimdall: TestFailover_SwitchOnPrimaryDown and TestRegistry_MarkUnhealthyOnRealFailure set only getSpanFn on the primary mock, leaving FetchStatus returning success. The registry's background probe then raced with the test's MarkUnhealthy call and could flip the primary back to healthy (or the active gauge back to 0) before the assertion ran. Make the primary mock fail FetchStatus too so probe and API are consistent. core/state: TestConcurrentUsedParallelism measures wall-clock parallel speedup and asserts >=2x. Race instrumentation serializes atomic/mutex ops and skews the measurement to ~1.7x. Skip under -race via a new race_{on,off}_test.go build-tag pair; the test still guards against the global-lock regression in non-race runs. core/txpool/legacypool: TestLockOrdering_{PricedHeapNoDeadlock, ReplacePendingNoDeadlock,RemovedNoDeadlock} used a 10s deadlock-detect timeout that is genuinely too short once -race instrumentation is added (the bare test takes ~12s). Bump to 60s — still catches real deadlocks, no longer fires on legitimate completion. eth/downloader: TestBeaconSync68/69Full used a 3s sync timeout that CI routinely exceeded under -race. Bump to 30s. TestSkeletonSyncRetrievals had four 2s polling loops that could exit before background serving goroutines finished incrementing the served counter (the assertion then read a partial count). Bump the budget to 30s, cap per-iteration sleep at 500ms so exponential backoff stays responsive, and fold the served counter into the polling condition so we wait for both subchain state and served totals before asserting. eth/relay: TestCheckTxStatus, TestSubmitPreconfTx, TestSubmitPrivateTx parallel subtests had handlers sleeping for rpcTimeout-100ms, leaving only 100ms of slack before the client-side 2s timeout fired; under -race that slack was easily exhausted. Halve the handler sleep to rpcTimeout/2 — still proves parallelism (3 serial calls would exceed the unchanged 2s upper bound) with comfortable margin for -race. metrics: TestExpDecaySampleNanosecondRegression relied on the unseeded global RNG and wall-clock timing, making the reservoir's average a noisy statistic that occasionally drifted outside [14, 16] under -race. Drive the test with a seeded RNG and synthesised monotonic timestamps; override t0/t1 after NewExpDecaySample so dt in update() stays positive. Test still catches the priority-overflow regression (average would stick at 10 under the buggy formula). Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Merge pull request #2191 from 0xPolygon/psp-enable-wit-fs internal/cli: enabled witness filestore by default * eth: cancel milestone subscriber on shutdown (#2199) * Merge pull request #2175 from 0xPolygon/manav/disable-pending-loop (feat): disable pending block creation loop via flag * rawdb: recover witness size from filesystem metadata miss (#2200) * ci: fix kurtosis setup/cleanup action paths (#2223) kurtosis-pos moved the setup and cleanup composite actions under .github/actions/kurtosis/, so the bor workflows fail with "Can't find action.yml" at the Pre/Post kurtosis run steps after checkout. Update both kurtosis-e2e.yml and kurtosis-stateless-e2e.yml to use ./.github/actions/kurtosis/setup and ./.github/actions/kurtosis/cleanup. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fixing amoy static peers (#2209) * core/types: guard (*Header).GetValidatorBytes against short Extra (#2222) * core/types: guard (*Header).GetValidatorBytes against short Extra (*Header).GetValidatorBytes panics with 'slice bounds out of range' when len(h.Extra) < ExtraVanityLength+ExtraSealLength on pre-Cancun chain configs. The post-Cancun branch and companion helpers already guard this path; hoist the check to the top so both branches are safe. Add TestGetValidatorBytesShortExtra as regression test. Discovered with Zorya (concolic execution). Fixes #2221 * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * chore: pos workspace setup (#2218) * miner, consensus: more block building metrics (#2228) * consensus/bor, miner: fix stale pending state for non-validator nodes (#2183) * consensus/bor, miner: fix stale pending state for non-validator nodes Nodes whose signer is configured but not in the active validator set (e.g. after a validator rotation) returned "insufficient funds for transfer" on eth_estimateGas and other RPC calls against the "pending" block. The root cause was that Prepare() returned UnauthorizedSignerError when the signer was not in the validator set, which caused prepareWork() to fail on every block. Since commit() was never called, the pending snapshot was never refreshed, and its underlying pathdb trie layers were eventually garbage-collected — making all state reads return zero. Fix Prepare() to not fail for unauthorized signers. Instead, default the succession number to 0 and let the block building pipeline continue normally. Seal() already independently rejects blocks from unauthorized signers, so no invalid blocks can be produced. This ensures the pending snapshot is refreshed on every chain head, keeping RPC queries functional regardless of whether the node is actively producing blocks. Also remove a redundant second call to GetSignerSuccessionNumber in the Giugliano waitOnPrepare block, reusing the succession value already computed earlier in Prepare(). * fix linter * Merge pull request #2229 from 0xPolygon/psp-pos-3598 docs, .claude: add hardfork rollout review guidance * build: bump Go to 1.26.3 (#2230) * consensus/bor, internal/cli: full grpc implementation (#2194) * implement full grpc comms with heimdall * address comments * address comments * address comments * address comments * address comments * revert tracers/data.csv * internal/cli, docs: skip gRPC on empty addr, warn unauth non-loopback * internal/cli: address comments * internal/cli: address comment * internal/cli: implement additional tests * internal/cli: address comments and improve tests/validation * internal/cli: address comments and add new tests * internal/cli/server: map reorg & non-contiguous range sentinels; tidy cmd/keeper go.sum mapBorAPIError now translates the two GetRootHash sentinels that previously fell through to codes.Internal: - "reorg occurred while computing checkpoint root" -> codes.Aborted (transient/retriable; clients can retry once the chain settles) - "non-contiguous headers in checkpoint range" -> codes.DataLoss (DB inconsistency / ancient pruning; conveys unrecoverable server state without the "this is a bug" implication of Internal) Also runs go mod tidy in cmd/keeper to add the missing golang.org/x/sys v0.42.0 h1: source hash that the previous bump left out. * internal/cli: add maxGRPCMessageSize to mirror HTTP behaviour * internal/cli: add nil check on grpc addr. Bump go to fix vulncheck * chore: fix govuln * internal/cli: lower chunkSize * chore: bump polyproto * templates: replace dead witnessprotocol with witness.enable hint (#2204) * templates: replace dead witnessprotocol with witness.enable hint * ci: stabilize codecov/diffguard and skip fork-only e2e constraints * templates: move witness hint and drop workflow edits * prefetcher: builder-phase prefetch + streaming worker pool (#2192) * prefetcher: streaming worker pool with unified idle/builder tx providers Replace per-call Prefetch() invocations with a long-running PrefetchStream that spans the block lifetime. A fixed worker pool pulls transactions from a channel, eliminating pool-startup overhead across the ~500 builder-mode and ~20 idle-mode calls previously issued per block. Idle and builder phases share the same stream, acting as swappable tx providers. Phase handoff uses a two-signal interrupt: hardKill for permanent exit and evmAbort for soft, per-phase aborts that discard in-flight idle work before the builder provider takes over. Prefetch(block, ...) keeps its original signature as a thin wrapper so blockchain.go behavior stays byte-identical. * miner: prefetch observability — slow-tx prefetch flag, split apply timers, builder contribution metric Correlate tail latency with prefetch miss rate so we can tell whether the slowest block txs are the ones we failed to prefetch or whether prefetch quality is not the lever. Three additions: * slow_tx_tracker 10-min log: each of the top-K slowest txs is now annotated inline with MGas/s and a prefetched/not-prefetched flag. MGas/s uses integer math on nanoseconds — per-tx durations are in the tens of microseconds and float seconds lose precision. * worker/txApplyDuration/{prefetched,notPrefetched}: side-by-side p50/ p95/p99 of the two populations. The aggregate worker/txApplyDuration stays for dashboard compatibility. * worker/prefetch/builder_added_percent: share of block txs that the builder phase prefetched, isolating the payoff of the new builder- phase prefetch beyond the existing overall miss-rate metric. * prefetcher: grow stream channel buffer from 512 to 4096 Sizes the buffer to the idle provider's theoretical per-loop burst (IdleGasBudget / 21k ≈ 4762 at default config) so the scan never blocks on a full channel. Workers drain at ~200k tx/s so the channel sits mostly empty in practice; the buffer exists to absorb the brief scan-phase burst where idle dumps a heap's worth of txs. Memory cost: 4096 * 8 B + channel header ≈ 33 KB, negligible. Handoff drain is O(actual-occupancy), not capacity, so a larger buffer doesn't slow phase transitions. * prefetcher: gate builder-phase attribution on post-handoff flag + goimports fixes Review feedback: the old onSuccess closure checked genParams.builderStarted.Load() to decide whether to attribute a successful prefetch to the builder phase. That check races with buildAndCommitBlock's builderStarted.Store(true), which fires before runPrefetcher reaches the handoff — an idle-phase tx whose EVM execution finishes in the gap between those two moments would be miscounted as builder. Impact was metric-only but the old comment overstated the guarantee. Route attribution through a dedicated inBuilderPhase *atomic.Bool that the coordinator flips to true only after the handoff completes (evmAbort drain + reset). Any onSuccess firing after that point is known to come from post-handoff work, so builder_added_percent now reflects genuine builder-phase contribution. Also fix goimports formatting in core/state_prefetcher.go and miner/worker.go flagged by CI lint. * prefetcher: stop idle batch on full channel instead of spinning the heap Review feedback: when the stream channel fills mid-batch, the default branch in streamIdleBatch was dropping the tx but still calling txs.Shift() and continuing to walk the heap. Since dropped sends don't subtract from the gas budget or populate localPrefetched, the outer loop's viability check (nextViableIdleTx) kept returning valid txs, so every remaining entry burned Peek + Shift cycles only to drop. Return on the default branch instead. The outer runIdleTxProvider loop re-snapshots the pool every ~100ms, by which time workers have drained the channel and there's capacity for new sends. Dropping one tx per batch and re-entering the outer loop is strictly cheaper than walking the rest of the heap dropping everything. * prefetcher: nil-guard prefetchedHashes + stop charging gas for failed resolves Review feedback on buildTxPlan / scanOverflow / collectPlanBatch: * buildTxPlan decremented remaining before calling ltx.Resolve(), so when a LazyTransaction failed to resolve (tx evicted between heap listing and resolution) the budget was consumed for a tx that never entered the plan. Reorder: charge gas only after Resolve succeeds, matching scanOverflow. The prefetched-skip path keeps consuming gas intentionally — those txs are still bound for the block. * scanOverflow and collectPlanBatch called prefetchedHashes.Load() without a nil guard, while the sibling buildTxPlan explicitly guarded. Production is safe because commitWork always initializes the map, but the API inconsistency is a real footgun for unit tests or future callers. Add the missing guards so all three functions accept a nil *sync.Map as "no hashes known, don't skip". No functional change in the current call paths; tests unchanged. * prefetcher: recover panics in worker and sendPlan goroutines Review feedback: worker goroutines in PrefetchStream ran without their own recover. Go's recover only catches panics in its own goroutine — the parent runPrefetcher's recover does not protect the children. Any panic in statedb.Copy / ApplyMessage / IntermediateRoot / precompile edge cases would therefore kill the node even though prefetching runs on a throwaway state copy and is best-effort by definition. Add defer/recover per worker (new chain/prefetch/worker/panic meter) and match it on the sendPlan goroutine in fillTransactions (reuses the existing worker/prefetch/panic meter). Also documents the residual attribution edge where a worker stuck in IntermediateRoot (not interruptible by evmAbort) can still reach onSuccess after inBuilderPhase flips. Metric-only impact, bounded by at most one tx per block; added a comment instead of a runtime fix since the window is tiny in practice and closing it would require a more invasive barrier. * prefetcher: fix gas budget, interrupt accounting, panic leak, dup execution Round 4 of review fixes — four independent issues: * sendPlan gasLimit stale for the second call. The closure captured env.header.GasLimit once, but the second invocation (normal-tier txs) runs after commitTransactions has already consumed gas from env.gasPool. Plan was over-sized, wasting prefetch capacity on txs that the builder would reject. Thread gasLimit as an explicit argument and pass env.gasPool.Gas() on the second call (with a nil guard for the first, where env.gasPool is still nil). * prefetchOneTx early-interrupt return skipped fails.Add(1). txIndex is incremented unconditionally in processTx, so every interrupt-aborted tx was counted as a successful prefetch in blockPrefetchTxsValidMeter. The function docstring already promises fails is bumped on every (0,false) return; the interrupt path was the one branch that didn't. * runPrefetcher shutdown (evmAbort + close(txsCh) + <-streamDone) was sequential, so a panic in runIdleTxProvider or runBuilderTxProvider unwound past it. The PrefetchStream goroutine and its N workers would then block forever on `range txsCh` (hardKill is only checked after a dequeue). Wrap the close in a sync.Once-guarded closure and defer it so the channel is released on every exit path, panics included. * Builder phase could double-execute an in-flight plan tx. scanOverflow checked prefetchedHashes, which is only written by onSuccess after EVM completion — a tx forwarded in one batch but still mid-EVM when the next batch's overflow scan ran could be re-emitted and picked up by a second worker. Track locally-sent hashes in runBuilderTxProvider (single- threaded, plain map is fine) and have scanOverflow consult it alongside prefetchedHashes. * prefetcher: address within-iter dup, heap preservation, panic fails - scanOverflow: break instead of Pop on gas-too-large so accumulated extendedBudget can cover price-leading accounts on later iterations; popping permanently evicted them from the long-lived overflow heap. - runBuilderTxProvider: populate sentThisPhase with the current plan batch before scanOverflow so bonus collection can't re-emit a tx already returned by collectPlanBatch in the same window. - PrefetchStream worker recover: increment ctx.fails so the {valid,invalid} meters stay consistent with txIndex when a panic aborts prefetchOneTx mid-flight. * miner: unit tests for prefetch stream primitives Covers the helpers that prior review rounds kept surfacing bugs in: - scanOverflow: zero budget, budget growth preserves popped accounts (regression test for the h.Pop() → break fix), in-flight skip without budget cost, prefetched skip without budget cost. - forwardTxs: sent-hash recording, dropped-on-full channel must not record, nil sentThisPhase is safe. - collectPlanBatch: closed planCh, timer-fires-on-empty, freed-gas accumulation, prefetched filter, closed gas channel nilled out. - runBuilderTxProvider: no-duplicate-forward invariant across interleaved planCh + freed-gas signals — regression test for the within-iteration dedup fix (sentThisPhase pre-population). * prefetcher: signal interrupt on prepareWork early-return Move `defer interruptPrefetch.Store(true)` to the top of buildAndCommitBlock so the prefetcher goroutine is signaled to stop on every return path — including the early return when prepareWork fails (UnauthorizedSignerError, Heimdall snapshot/span failures, stateless.NewWitness errors). Previously the defer sat after the err check, so on prepareWork failure runIdleTxProvider would loop until gas exhaustion, burning ~100M gas of throwaway EVM work across the worker pool before self-terminating. No correctness/consensus impact; fixes a real CPU-waste path on an uncommon-but-realistic error branch. Bounded by PrefetchStream's sync.Once shutdown (no goroutine leak), but the wasted work was material under prolonged Heimdall issues when consecutive commitWork calls could pile up zombie prefetchers. * prefetcher: close scanOverflow→plan cross-iter dedup edge Pass sentThisPhase into collectPlanBatch and skip txs already present, mirroring the existing scanOverflow dedup pattern. Closes the third edge of the builder-phase dedup matrix: 1. within-iter plan↔overflow — covered by pre-populate sentThisPhase before scanOverflow (commit 62a72c0) 2. plan→overflow cross-iter — covered by scanOverflow's sentThisPhase read (commit 88c4e44) 3. overflow→plan cross-iter — this commit Scenario: scanOverflow emits tx T in iteration N (h.Shift() past T, sentThisPhase[T]=true). Worker W1 begins multi-ms EVM on T. In iteration N+1, collectPlanBatch reads a buffered copy of T from planCh; prefetchedHashes is still empty because onSuccess hasn't fired, so T slips through and gets forwarded a second time. Impact is wasted worker capacity only (throwaway state, no consensus/correctness effect), but the race opens wider exactly on contract-heavy workloads where builder-phase prefetch matters most. Kurtosis run #2 scenario E didn't catch this because value-transfer EVM finishes in ~10µs, well inside the 2ms batch window — so prefetchedHashes raced fast enough to plug the gap. Added TestCollectPlanBatch_SkipsInflight as the regression test. * prefetcher: address review feedback (constants, naming, sendPlan) - hoist prefetch tunables (chan buf, idle loop interval, gas-pct default/cap) into the file's top const block - rename sentThisPhase to inFlightHashes — the variable tracks txs forwarded on txsCh whose onSuccess hasn't fired yet - promote sendPlan from a closure inside fillTransactions to a free function alongside buildTxPlan * prefetcher: disable IntermediateRoot per tx; add evaluation benchmarks The streaming prefetcher's per-tx IntermediateRoot call (introduced via the intermediateRootPrefetch flag, hardcoded true) was found to add 80–130% prefetch wall time for at most ~10% commit speedup (≈0.1 ms in absolute terms). With snapshots active, EVM reads bypass the trie entirely, so the only path to value is warming pebble's block cache for the subsequent Commit. Under realistic clean-cache sizes that working set is already resident, leaving the warming work redundant in nearly every regime. Upstream go-ethereum's prefetcher likewise does not compute intermediate roots — its Prefetch is (block, statedb, cfg, interrupt) only. Flips the call site to false (parameter retained for API stability) and checks in three Go benchmarks documenting the methodology and findings: - TestIntermediateRootPrefetch_AccuracyVsCost: in-memory baseline, three access patterns (hot 3-slot, unique-spread, per-sender counter) - TestIntermediateRootPrefetch_PebbleAccuracyVsCost: real pebble disk with prewarm + Commit, hashdb meter deltas - TestIntermediateRootPrefetch_HeavyColdContract: 100k pre-populated storage slots with a constrained clean cache, modelling fat contracts that haven't been touched recently Across all three, flag=true produced zero cache-hit benefit measurable in StorageHitFromPrefetch / clean cache deltas, with consistent prefetch CPU overhead. The residual ~10% commit speedup observed in the heavy-cold regime is attributable to pebble block-cache warming and amounts to ≈0.13 ms per affected block. * prefetcher: kill diffguard mutation survivors Extract resolveEvmInterrupt helper so the evmAbort/hardKill fallback wiring is unit-testable. Add focused unit tests for preloadReaderForTx (bad-signature, contract-To code preload, EOA/nil-To) and streamIdleBatch (per-loop gas budget). Brings T1 logic and T2 semantic mutation kill rate to 100% on the diff. * prefetcher: fix lint (unconvert + trailing newline) Drops a redundant uint64() conversion in the streamIdleBatch test and trims the trailing blank line goimports flagged. * prefetcher: nil-guard genParams in sendPlan The function nil-checked genParams before reading prefetchedTxHashes but then unconditionally dereferenced genParams.planWg.Add(1). Currently unreachable — callers in fillTransactions derive builderPlanCh from genParams.builderPlanCh so genParams is always non-nil — but the inconsistent guard is misleading and would panic on the block-production path if a future caller drops that invariant. * core/vm: add hardfork precompile continuity checks (#2234) * core/vm: add hardfork precompile continuity checks * docs: clarify Pro precompile continuity guidance * packaging/docs: replace deprecated rpc.batchlimit key (#2216) * packaging/docs: replace deprecated rpc.batchlimit key * packaging/docs: set rpc.batch-request-limit default to 1000 * p2p: add flag to disable snap/1 serving (#2152) * flag to disable snap serving * p2p: add clarifying comment on NoSnapServing behaviour Document that NoSnapServing=true disables snap/1 entirely (no serving and no snap-syncing from peers), while the in-memory snapshot tree remains active for local state reads. * miner, consensus/bor: fix leaked-wedge family in worker state machine (#2220) * miner, consensus/bor: fix leaked-wedge family in worker state machine Closes four silent stall paths in the producer state machine 1. miner.mainLoop: when PeerCount==0 on production chains, the dropped newWorkReq branch now clears pendingWorkBlock instead of leaking it. 2. miner.commitWork: the defer that clears pendingWorkBlock is now registered above the early syncing-check return. 3. miner.taskLoop: interrupt() now deletes the previous sealhash's pendingTasks entry. Bor.Seal's stop-branch returns silently without posting to resultCh, so resultLoop never cleans the entry. 4. consensus/bor.Seal: the result-delivery goroutine's second select no longer has a silent default branch — it blocks on send or exits via <-stop. Regression tests cover each path (four unit + one integration): - TestMainLoopClearsPendingWorkBlockOnPeerCountZero - TestCommitWorkLeaksPendingWorkBlockWhenSyncing - TestTaskLoopInterruptCleansStalePendingTasks - TestSeal_BlocksOnFullResultChannelInsteadOfSilentDrop - TestProducerRecoversAfterMiningRestart (tests/bor, integration tag) * miner: gate PeerCount check on heimdall presence, not chain-ID The previous chain-ID gate (BorMainnet/Mumbai/Amoy) blocked the PeerCount==0 drop from being exercised by kurtosis chaos tests on chain ID 4927. Replacing it with chain-ID list omissions broke dev/test setups (Clique single-node, intentional-disconnection tests like TestValidatorWentOffline) where peer count is legitimately zero by design. Gate on heimdall presence instead. The HeimdallClient is nil on all test/dev setups (--bor.withoutheimdall, Clique, Ethash) and non-nil for any real-network Bor node, so the semantic 'this is a real network node' is captured directly. Forward-compatible: new production chain IDs don't need to be added to a list. Also restore the !IsRio removal in the veblop fallback: the post-Rio kurtosis bootstrap needs the periodic retry trigger after the PeerCount drop. Gate the fallback on !isBor instead so Clique/Ethash don't accidentally fire it. Fixes CI failures introduced by the prior commit: - TestServer_DeveloperMode (internal/cli/server) - TestCommand_DebugBlock (internal/cli) - TestValidatorWentOffline (tests/bor, integration) * miner: fix data race on pendingTasks length in stall warning Caller in newWorkLoop's veblopTimer case captured len(w.pendingTasks) under pendingMu, but the helper warnIfStalled read len(w.pendingTasks) again unguarded in its log statement. Concurrent taskLoop / resultLoop mutations would race with the read. Capture the count once under the lock and pass it through to the helper instead. Also drop the redundant DevFakeAuthor=false write from TestMainLoopClearsPendingWorkBlockOnPeerCountZero — NewFakeBor already constructs with DevFakeAuthor=false, and writing after mainLoop has captured the value at startup is itself a data race that -race flagged. Reported in PR #2220 review. * miner, tests/bor: update regression-test docstrings to match heimdall-gate The PeerCount==0 fix was originally chain-ID-gated (BorMainnet/Mumbai/Amoy) and the test docstrings reflected that. After review the gate was changed to heimdall-presence (`bor.HeimdallClient != nil`), but the docstrings were not updated. This commit corrects the inaccuracies: - TestMainLoopClearsPendingWorkBlockOnPeerCountZero now correctly describes the heimdall-presence trigger and notes that the test uses the mock heimdall client (non-nil) from NewFakeBor — no ChainID override exists. - TestProducerRecoversAfterMiningRestart now correctly explains that Bug 1's drop branch isn't exercised because the test uses withoutHeimdall=true (HeimdallClient nil), not because of chain-ID. * miner, consensus/bor: scope pendingTasks cleanup to Seal stop-branch exits The previous taskLoop fix made interrupt() unconditionally call deletePendingTask(prev), which races with Bor.Seal's success path: when the goroutine has already delivered the result to resultCh but resultLoop is busy, interrupt() deletes the entry resultLoop is about to look up, and the !exist branch silently drops the validly-sealed block. Move cleanup into Bor.Seal itself, gated to stop-branch exits, via a new SealWithStopHook(..., onStopExit func()) method. The existing Seal becomes a thin wrapper passing nil to preserve consensus.Engine. taskLoop type-asserts to *bor.Bor and uses SealWithStopHook with a per-sealhash cleanup closure. interrupt() now only closes stopCh. Adds three bor-level tests covering both stop-branch exits and the success path, and rewrites the worker-level test to assert the new contract: interrupt() must NOT delete pendingTasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner, tests/bor: update docstrings for SealWithStopHook cleanup direction Two docstrings still described the pre-revision design where interrupt() deleted pendingTasks directly: - tests/bor/bor_test.go bullet #3 named a unit test (TestTaskLoopInterruptCleansStalePendingTasks) that no longer exists; the actual test is TestTaskLoopInterruptPreservesPendingTasks asserting the opposite semantic. - miner/worker.go deletePendingTask docstring said "Used by taskLoop.interrupt" but the caller is the per-task onStopExit closure passed to Bor.SealWithStopHook. Documentation-only. * miner: drop PeerCount==0 gate in mainLoop (#2238) * miner: drop PeerCount==0 gate in mainLoop The peer-count drop branch in `mainLoop` was introduced in #977 to delay block production until at least one peer was connected, scoped to BorMainnet/Mumbai (Amoy was added later in e8157bbe6). #2220 widened the gate from a chain-ID list to "any heimdall-connected node", which drew kurtosis devnets into the gate for the first time: in a single-validator kurtosis devnet PeerCount() is 0 by design (no other node to peer with), the gate drops every newWorkReq, and the veblop fallback timer re-fires straight back into the same drop. Veblop's primary/backup producer election already handles outage safety at the protocol level — peer count adds no safety, only a foot-gun. Strip the gate; `mainLoop` now routes every newWorkReq directly to `commitWork` (except the pre-existing DisablePendingBlock-on-non-validator short-circuit). Verified end-to-end on a kurtosis 1-validator devnet (chain ID 4927, veblop active). * miner: drop stale PeerCount==0 gate references in test backends Follow-up to the gate removal. With no test exercising peer count anymore, the `testWorkerBackend.peerCount` atomic.Int32 plumbing in `worker_test.go` was just ceremony — drop the field, the `Store(1)` default, and simplify `PeerCount()` to return a constant. The `mockBackend` / `mockBackendBor` PeerCount comments in `miner_test.go` and `fake_miner.go` still named the removed gate; rewrite them to describe what the function does rather than what it no longer guards. No production change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner: drop dangling incident references in syncing-leak test docstring The doc comment and t.Fatalf message for TestCommitWorkLeaksPendingWorkBlockWhenSyncing framed it against the "PeerCount==0 race fixed for the 2026-05-07 incident" and the related val4 stall — context that disappeared from the tree with the gate removal and the sibling regression test in this branch's earlier commits. Rewrite the docstring to describe the leak on its own terms and trim the t.Fatalf tail so a future reader hitting the failure doesn't grep for a name that no longer exists in-tree. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests/bor: drop dangling gate references in producer-recovery docstring TestProducerRecoversAfterMiningRestart's docstring framed itself against four leak paths; item (1) named the deleted unit test TestMainLoopClearsPendingWorkBlockOnPeerCountZero and the deleted gate (`realNetworkNode := bor.HeimdallClient != nil`). The integration-level limitations clause also leaned on Bug 1 / that gate. Drop both. Renumber the remaining three leak paths and add a short historical note that the fourth was closed by removing the gate itself in earlier commits on this branch. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * tests/bor: drop dangling Bug 1 references in producer-recovery test After the previous commit renumbered the four-bug list down to three, three remaining "Bug 1" mentions in the docstring header and the test body's Phase 2 / Phase 3 comments became dangling (the enumeration no longer has a Bug 1 entry). Reword each to point at the generic "leak family" instead. The test's behavior is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * miner: format build-block log times as RFC3339Nano, gate on IsRunning (#2239) * consensus/bor: fix race in SpanStore.PurgeCache (#2235) * consensus/bor: fix race in SpanStore.PurgeCache PurgeCache cleared latestSpanCache via atomic Store(nil) but did not stop the background polling goroutine started by NewSpanStore. That goroutine ticks every 200ms and writes the latest span back into latestSpanCache, silently undoing the purge whenever a tick lands between the clear and the caller's next read. Fix: extract the loop into runPollLoop, track it with a WaitGroup, and have PurgeCache stop and join the goroutine before resetting state. Close uses the same path. PurgeCache no longer restarts the loop — on-demand reads via getLatestSpan fall back to updateLatestSpan inline, so callers still get fresh data without the race window. Adds TestSpanStore_PurgeCache_RaceWithPollLoop, a deterministic reproducer that sleeps past one tick before asserting. Fails reliably on develop, passes with the fix. * consensus/bor: clear heimdallStatus in PurgeCache With the poll loop stopped after PurgeCache, a stale heimdallStatus (typically CatchingUp:false) would persist and let waitUntilHeimdallIsSynced return immediately without refreshing against a freshly-swapped heimdall client. Clear it alongside the other atomics, and extend the reproducer to assert the invariant. * build: bump x/net for govulncheck (#2240) * core/forkid: include polygon-specific forks in wire forkid (#2237) * core/forkid: include polygon-specific forks in wire forkid NewID computes the eth p2p handshake forkid by calling gatherForks (lowercase), which uses reflection over the outer ChainConfig struct and is blind to the polygon-specific block fields nested under config.Bor (RioBlock, MadhugiriBlock, DandeliBlock, LisovoBlock, LisovoProBlock, GiuglianoBlock, ChicagoBlock). The polygon-aware GatherForks (uppercase) was added in #2063 to expose fork data via the new bor_forks RPC, but the wire path was never updated to use it. As a result, bor's wire forkid does not include any polygon-specific fork activation in its checksum. On chains where polygon forks activate at non-zero blocks (devnets, testnets, mainnet post-fork), bor's wire forkid is inconsistent with what the bor_forks RPC reports for the same node. This was latent until erigon v3.6.0 started including the new polygon forks (Lisovo, LisovoPro, Giugliano, Chicago) in its own GatherForks. erigon's wire forkid now correctly hashes those blocks while bor's still does not, causing the eth handshake to fail with 'fork ID rejected: local incompatible or needs update' on any deployment combining bor v2.8.0 with erigon v3.6.0 on a chain where those forks activate above block 0. Switch NewID to call GatherForks so the wire forkid includes the polygon-specific forks and matches what bor_forks RPC reports. * core/forkid: include polygon-specific forks in inbound filter newFilter (used by NewFilter / NewStaticFilter) builds the local table of valid fork checksums that the inbound eth handshake validator matches incoming peers against. It was calling lowercase gatherForks, which is the same polygon-blind reflection-only path the previous commit fixed in NewID. Without this, a patched bor (with the fixed NewID) would emit a forkid that includes the polygon-specific forks, but its own inbound filter would compute the local sums[] table without them. Incoming peer forkids would never match the table by exact rule #1 of EIP-2124 and would only be accepted via the rule #3 "remote is a superset" forgiveness path. The validator's accept/reject behaviour would be correct by accident rather than by design, and any nuance that depends on rule #1 (matching local fork state) would be wrong. Switch newFilter to GatherForks so the validator's sums[] table is symmetric with what NewID emits. After this, the eth handshake's forkid check is end-to-end polygon-aware on bor. * (feat): tracing support for state-sync transactions (#2236) * eth/tracers: add state-sync tracing support Add live tracing and refactor tracing methods for state-sync transactions. Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> * core: call OnTxEnd hook only if state-sync is added * consensus/bor: remove duplicate import * internal/cli: use stream context * consensus/bor: use deterministic time for state-sync traces * eth/tracers: remove TODO * consensus/bor: fix build * eth/tracers: make cumulative gas used semantics same for tracing * core/types: removed unused system address * eth/tracers: sort statedb logs * eth/tracers: fix comment * eth/tracers: add transaction index to state-sync reciept * address claude comments * eth: move tx index check * core/types: use bor system address as sender address of state-sync transactions * eth/tracers: add state-sync custom tracer to handle invariants * core, eth, consensus: fix live state-sync tracing * eth/tracers: add state sync wrapped hooks test * eth, consensus: address claude comments * tests/bor: add e2e tests * eth/tracers: fix lint * core/types: undo sender change * consensus/bor: remove comments * eth, consensus: apply some simplifications * consensus/bor: apply diffguard changes * eth, core, consensus: increase code coverage * eth/tracers: fix prestate tracer for state-sync transactions * eth/tracers: add tests to run all tracers against state-sync transaction * eth, core: explain state-sync address representation in comments * core/types: add tests to validate empty values in state-sync tx getters * eth: disable live tracer completely if parallel evm is enabled * eth/tracers: record gas used for state-sync transactions in root frame * eth/tracers: add explainer log for sorting state db logs * consensus/bor: reuse reference * pass vmconfig while bor consensus creation * core: check for start and end hooks before tracing * consensus, eth: pass ctx in state-sync execution for timeout, add bloom to trace receipts * core/types: fix typo * eth/tracers: correct timeout ctx * eth/tracers: add more tests * eth, consensus: address bot comments * builder, packaging/templates: update all config.toml files --------- Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> * Fix triejournaldirectory wiring (#2213) * wire trie journal directory from server config * server: guard trie journal path resolution * server: deduplicate trie journal default resolution --------- Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> * miner: fix TestPrefetchRaceWithSetExtra hanging on unbuffered channel send (#2233) Assisted-by: Claude:claude-sonnet-4-6 Co-authored-by: Lucca Martins <lucca_martins30@yahoo.com.br> * miner: disable tracer in vm.Config to prevent conflicts during block synchronization (#2206) * miner: disable tracer in vm.Config to prevent conflicts during block synchronization * Update miner/worker.go Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * miner: add test to verify vmConfig strips tracer during live tracing --------- Co-authored-by: StreamingFast Agent <agent@streamingfast.io> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * ethclient: bound tx indexing wait in tests (#2211) * chore: use Claude Opus 4.7 1M context in workflows (#2241) * ci: remove claude github actions (#2244) Replaced by the built-in Claude review bot now enabled on this repo. * params: version bump to v2.8.3-beta * v2.8.2 (#2245) * miner: track env.size in commitTransaction env.size feeds txFitsSize for the params.MaxBlockSize cap. It was seeded from header.Size() in makeEnv but never updated as transactions were committed, so the cap was effectively inert. * miner: set maxBlockSizeBufferZone to 1MB * Bump up version to v2.8.2 * params: version bump to v2.8.3-beta2 * params: version bump to v2.8.3-beta3 * Revert "core/forkid: include polygon-specific forks in wire forkid (#2237)" This reverts commit 5dc5adb9eebe7cfcd2707d54194a10e34d21481a. * params: version bump to v2.8.3-beta4 * Revert "core/forkid: include polygon-specific forks in wire forkid (#2237)" (#2248) This reverts commit 5dc5adb9eebe7cfcd2707d54194a10e34d21481a. * consensus: fix milestone-mismatch rewind deadlock (#2246) * params: bump version to v2.8.3-beta5 * params: bump version to v2.8.3 * build: bump Go to 1.26.4 to fix stdlib CVEs (GO-2026-5039, GO-2026-5037) * Revert "build: bump Go to 1.26.4 to fix stdlib CVEs (GO-2026-5039, GO-2026-5037)" This reverts commit b5a182b7906b066509af6db62f3e9527a73896a2. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Lake <lake@dunamu.com> Co-authored-by: Manav Darji <manavdarji.india@gmail.com> Co-authored-by: Larry Arnault <Larryarnaultla@gmail.com> Co-authored-by: Harsh kumar <51978908+harshinsecurity@users.noreply.github.com> Co-authored-by: harshinsecurity <hi@harshinsecurity.in> Co-authored-by: Aaron Ajose <aaronajosee@gmail.com> Co-authored-by: haoyu-haoyu <85037553+haoyu-haoyu@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Pratik Patil <pratikspatil024@gmail.com> Co-authored-by: Jerry <jerrycgh@gmail.com> Co-authored-by: Krishang <109511742+kamuikatsurgi@users.noreply.github.com> Co-authored-by: Sanket Saagar Karan <ssaagar@polygon.technology> Co-authored-by: Karolina G. <60556109+kajaaz@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: Marcello Ardizzone <marcelloardizzone@hotmail.it> Co-authored-by: Andrei Petrov <andreipetrovqs@gmail.com> Co-authored-by: Olivier Dupont <olivierdupontvier@gmail.com> Co-authored-by: Léo Vincent <28714795+leovct@users.noreply.github.com> Co-authored-by: Milan Došlić <milan.doslic@tenderly.co> Co-authored-by: Jonas Schneider <jonasschnierderrrrrr@gmail.com> Co-authored-by: William Morriss <wjmelements@gmail.com> Co-authored-by: Matthieu Vachon <matt@streamingfast.io> Co-authored-by: StreamingFast Agent <agent@streamingfast.io> Co-authored-by: Adam Dossa <adossa@polygon.technology>

marcello33 added 2 commits April 23, 2026 15:31

implement full grpc comms with heimdall

68f5e58

Merge branch 'develop' into mardizzone/grpc_improv

fd961fa

Copilot AI review requested due to automatic review settings April 23, 2026 14:07

claude Bot reviewed Apr 23, 2026

View reviewed changes

Copilot started reviewing on behalf of marcello33 April 23, 2026 14:07 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

claude Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go Outdated

address comments

fe62ec8

claude Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go Outdated

Comment thread internal/cli/server/api_service.go

address comments

edafc70

Copilot AI review requested due to automatic review settings April 23, 2026 17:15

Copilot started reviewing on behalf of marcello33 April 23, 2026 17:16 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread internal/cli/server/server.go

Comment thread internal/cli/server/server.go Outdated

Comment thread internal/cli/server/server.go

Comment thread internal/cli/server/api_service.go Outdated

address comments

43b6efc

address comments

84b7d5c

Copilot AI review requested due to automatic review settings April 23, 2026 18:40

Copilot started reviewing on behalf of marcello33 April 23, 2026 18:40 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Comment thread internal/cli/server/server.go

claude Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go

chore: fix govuln

97a066c

Copilot AI review requested due to automatic review settings May 8, 2026 12:11

Copilot started reviewing on behalf of marcello33 May 8, 2026 12:12 View session

marcello33 review requested due to automatic review settings May 8, 2026 12:12

claude Bot reviewed May 8, 2026

View reviewed changes

Comment thread internal/cli/server/server.go

Comment thread cmd/keeper/go.mod Outdated

internal/cli: lower chunkSize

953241d

marcello33 requested review from Copilot and removed request for Copilot May 8, 2026 13:27

Copilot started reviewing on behalf of marcello33 May 8, 2026 13:27 View session

pratikspatil024 previously approved these changes May 15, 2026

View reviewed changes

chore: bump polyproto

2774394

Copilot AI review requested due to automatic review settings May 18, 2026 06:21

marcello33 dismissed pratikspatil024’s stale review via 2774394 May 18, 2026 06:21

claude Bot reviewed May 18, 2026

View reviewed changes

Copilot started reviewing on behalf of marcello33 May 18, 2026 06:21 View session

Copilot AI reviewed May 18, 2026

View reviewed changes

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go

Comment thread internal/cli/server/api_service.go

Merge branch 'develop' into mardizzone/grpc_improv

9be3368

pratikspatil024 previously approved these changes May 19, 2026

View reviewed changes

Merge branch 'develop' into mardizzone/grpc_improv

280ab3b

Copilot AI review requested due to automatic review settings May 19, 2026 15:33

marcello33 dismissed pratikspatil024’s stale review via 280ab3b May 19, 2026 15:33

Copilot started reviewing on behalf of marcello33 May 19, 2026 15:33 View session

pratikspatil024 approved these changes May 19, 2026

View reviewed changes

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread internal/cli/server/api_service.go

cffls approved these changes May 19, 2026

View reviewed changes

marcello33 merged commit f2f1849 into develop May 19, 2026
22 of 23 checks passed

Conversation

marcello33 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Results

Latency (ns per op)

Allocations (bytes and count per op)

Ambient system-level resource usage (per phase, 5 s interval across 1000 blocks)

CPU samples during bench window (pprof, 120 s each, aligned to bench run)

Details

What this enables on the heimdall side

Info about the gRPC authentication

Enabling it

Step 1: Bor side

Step 2: Heimdall side

Rollout recommendation

Downgrade path

Tests

Dependencies

Out of scope

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

marcello33 commented Apr 23, 2026

Uh oh!

claude Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: gRPC Implementation

Finding 1 — uint64 overflow in GetBlockInfoInBatch loop (HIGH)

Finding 2 — gRPC auth token visible in process listing (MEDIUM)

Finding 3 — Shared *uint64 pointers in headerToProtoBorHeader (LOW)

Finding 4 — Missing context cancellation check in batch loop (LOW)

Finding 5 — Redundant hash conversion in GetTdByHash (LOW)

Finding 6 — data.csv row reordering (INFO)

Positive observations

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot commented Apr 23, 2026

Code Review

Bug: uint64 overflow causes infinite loop in GetBlockInfoInBatch (DoS)

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

marcello33 commented Apr 23, 2026

Uh oh!

claude Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-review: gRPC Implementation (post fe62ec8)

Verified fixes

New observations on fe62ec8

Summary

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

marcello33 commented Apr 23, 2026 •

edited

Loading

claude Bot commented Apr 23, 2026 •

edited

Loading

Finding 1 — uint64 overflow in `GetBlockInfoInBatch` loop (HIGH)

Finding 3 — Shared `*uint64` pointers in `headerToProtoBorHeader` (LOW)

Finding 5 — Redundant hash conversion in `GetTdByHash` (LOW)

Finding 6 — `data.csv` row reordering (INFO)

codecov Bot commented Apr 23, 2026 •

edited

Loading

claude Bot commented Apr 23, 2026 •

edited

Loading

Re-review: gRPC Implementation (post `fe62ec8`)

New observations on `fe62ec8`

claude Bot commented Apr 23, 2026 •

edited

Loading

Re-review: gRPC Implementation (3rd pass — post `edafc70`)

claude Bot commented Apr 23, 2026 •

edited

Loading

Re-review: gRPC Implementation (4th pass — post `43b6efc`)

socket-security Bot commented May 18, 2026 •

edited

Loading