Skip to content

fix(chainlib): classify NEAR cause/data errors (UNKNOWN_BLOCK) correctly#2301

Open
AnnaR-prog wants to merge 1 commit into
mainfrom
fix/near-polygon-error-classification
Open

fix(chainlib): classify NEAR cause/data errors (UNKNOWN_BLOCK) correctly#2301
AnnaR-prog wants to merge 1 commit into
mainfrom
fix/near-polygon-error-classification

Conversation

@AnnaR-prog

Copy link
Copy Markdown
Contributor

Problem

ExtractNodeErrorDetails builds the message used for error classification from only .error.message (+ .error.code). NEAR carries its canonical error name in .error.cause.name. A request for a pruned block on a non-archive NEAR node returns:

{"error":{"code":-32000,"message":"Server error",
 "name":"HANDLER_ERROR","cause":{"name":"UNKNOWN_BLOCK"},
 "data":"DB Not Found Error: BLOCK HEIGHT: ..."}}

The discriminating token (UNKNOWN_BLOCK) lives in cause.name while .message is just "Server error", so the Tier-2 NEAR matcher never sees it and the error falls back to the generic NODE_SERVER_ERROR. On builds predating the error registry (#2261) the same error was tagged "unsupported method" (non-retryable + zero-CU), which suppressed consumer failover to an archive provider — the QoS/relay-failure symptom reported on NEART.

All four NEAR Tier-2 tokens are cause.name values: UNKNOWN_BLOCK, UNKNOWN_CHUNK, INVALID_SHARD_ID, NOT_SYNCED_YET.

Fix

Fold .name, .cause, and .data into the message used for classification + telemetry only. The node error still passes through to the user unchanged (transparent hop). Tier-2 (chain-scoped) matchers run before Tier-1, so the broadened message cannot pull a chain with its own matcher into a generic rule.

Verification

  • Regression test (TestExtractNodeErrorDetails_NEARUnknownBlock_FoldsCauseIntoClassification) uses the verbatim body captured from a live non-archive NEAR testnet node. Proven to fail without the change (NODE_SERVER_ERROR) and pass with it (CHAIN_NEAR_UNKNOWN_BLOCK, Retryable=true).
  • Full chainlib, common, rpcsmartrouter suites + go vet + gofumpt clean.

Scope (NEAR only — Polygon tracked separately)

The original report also named Polygon. Investigation showed Polygon's failure is an unrelated JSON-RPC id-validation bug (a client sending a non-scalar id), not a classification issue — Polygon error identity lives in .message and already classifies correctly. That is addressed in a separate PR. This change is NEAR-only and verified against a live NEAR testnet node.

🤖 Generated with Claude Code

@qodo-code-review

Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

avitenzer
avitenzer previously approved these changes May 26, 2026
ExtractNodeErrorDetails surfaced only .error.message for classification, but
NEAR carries its canonical error name in .error.cause.name (UNKNOWN_BLOCK,
UNKNOWN_CHUNK, INVALID_SHARD_ID, NOT_SYNCED_YET) while .message is just
"Server error". A pruned-block request to a non-archive NEART node therefore
missed the Tier-2 NEAR matcher and fell back to the generic NODE_SERVER_ERROR
rule (and, on builds predating the error registry, to a non-retryable
"unsupported method", which suppressed failover to an archive provider — the
reported QoS/relay failures).

Fold .name, .cause and .data into the classification message. This affects
classification and telemetry only; the node error still passes through to the
user unchanged (transparent hop). Tier-2 (chain-scoped) matchers run before
Tier-1, so the broadened message cannot pull a chain with its own matcher into
a generic rule.

Verified against the verbatim live NEART body for block_id 217272549.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
protocol/chainlib/node_error_handler.go 95.00% 1 Missing ⚠️
Flag Coverage Δ
consensus 8.96% <ø> (ø)
protocol 35.62% <95.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/chainlib/node_error_handler.go 73.63% <95.00%> (+2.04%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 8e90ae2. ± Comparison against base commit 99f113c.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants