fix(blockvalidation): surface metadata fetch error in ValidateBlock instead of assuming valid#778
Conversation
…nstead of assuming valid (refs #4581)
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. The fix correctly addresses a critical logic error where metadata fetch failures were incorrectly treated as validation success. Key improvements:
|
|
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-04-29 01:49 UTC |
oskarszoon
left a comment
There was a problem hiding this comment.
Approve. Closes a real (if narrow) consensus-correctness gap — pre-PR return nil after a GetBlockHeader flake signalled "block valid" without ever consulting blockMeta.Invalid. New ServiceError is the right answer and restores symmetry with the revalidation branch at :1192-1196. Test exercises the fixed path exactly and pins ErrServiceError wrapping.
Two notes worth recording:
- The asymmetry with
GetBlockExistswarn-and-proceed at:1170-1175is intentional and correct: existence unknown → revalidate (consensus-safe viarunOncePerBlock+StoreBlockduplicate-rejection); validity unknown after exists=true → fail-closed. One-line code comment at:1172documenting the asymmetry would help future maintainers not unify them. - Sonar's reliability=C is a filename-convention false positive — the package consistently uses PascalCase for type-bound area tests (
BlockValidation_test.go,BlockValidation_difficulty_test.go) and snake_case for bare-topic tests (catchup_*,fork_manager_*). The new file matches the area-test pattern; suggest overriding the Sonar gate rather than renaming.
Optional follow-ups, none gating:
- Retry-with-backoff on
GetBlockHeaderat:1177and:1192(2-3 retries at 50/100/200ms). A transient gRPC blip shouldn't propagate as ServiceError. - Latent
meta==nil but err==nilgap at:1182— falls through toreturn nil, same shape as the bug just fixed but with a silent-nil-meta failure mode. Worth a defensive guard + test.
Anything in the #4581 threat model worth adding to the PR description? Would help future readers since the issue isn't accessible to most reviewers.




Closes #4581.
Summary
When validating an existing block, if
blockchainClient.GetBlockHeadererrors (transient DB outage, gRPC timeout), the code logged a warning and returnednil— treating the block as valid. A metadata-fetch failure tells us nothing about block validity. The node could then report invalid blocks as valid to peers and downstream services.Fix
Return
errors.NewServiceError(...)instead ofnil. Matches what the revalidation branch a few lines below already does for the same situation. Caller can retry; transient failures are no longer mistaken for validity signals.Test plan
GetBlockHeaderto return an error; asserts aServiceErroris returned.go test -race ./services/blockvalidation/...passes.