Skip to content

feat(smart-router): implement direct RPC mode for RPCSmartRouter#2231

Merged
nimrod-teich merged 1 commit into
mainfrom
feat/direct-rpc-clean
Mar 11, 2026
Merged

feat(smart-router): implement direct RPC mode for RPCSmartRouter#2231
nimrod-teich merged 1 commit into
mainfrom
feat/direct-rpc-clean

Conversation

@NadavLevi

Copy link
Copy Markdown
Contributor

Add a standalone smart router that connects directly to RPC nodes without requiring blockchain state tracking, enabling lower-latency relay routing with built-in provider health checking and session management.

Core features:

  • Direct RPC connection handling with HTTP, WebSocket, gRPC, and REST support across EVM and Tendermint chains
  • Session management using composition pattern with DirectRPCConnection interfaces for endpoint tracking and block synchronization
  • WebSocket subscription management with multi-client handling, unique router IDs, Tendermint/EVM subscription support, and upstream connection pooling with backoff
  • gRPC streaming subscription management with dynamic message handling, connection pooling, and reflection-based proxy support
  • Per-endpoint ChainTracker manager for continuous block height polling across multiple direct RPC endpoints
  • Consistency validation with configurable block lag thresholds for pre-request endpoint health checking
  • Cache integration with read/write support and stateful relay bypass
  • Batch request handling using original JSON bytes instead of chainMessage serialization
  • IP forwarding from client requests to upstream nodes
  • Error mapping from node errors to protocol-compatible responses

Operational improvements:

  • Spec verification: static provider validation before chain tracker setup with multi-URL group validation (HTTP + WebSocket pairs)
  • NoOp WebSocket subscription manager for non-subscription endpoints
  • PANIC fix when nodeError leads to availabilityDegrader
  • REST URL path joining fix: append absolute paths instead of replacing base path (preserves gateway prefixes like /gateway/lava/rest/KEY)
  • Quorum logging improvements and typo fixes
  • Refactored configuration keys (direct-rpc/backup-direct-rpc) with backward compatibility for static-providers/backup-providers

Testing and tooling:

  • Comprehensive unit tests for gRPC subscription manager, WebSocket config, error mapper, batch requests, and session management
  • REST and direct RPC integration tests
  • Mock RPC and REST servers for local development
  • Smart router initialization scripts for Ethereum, Lava, gRPC, and Tendermint RPC endpoints
  • Example configuration files for ETH and Lava smart router setups

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • read the contribution guide
  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the main branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

feat(smart-router): implement direct RPC mode for RPCSmartRouter with chain tracking and subscription management

✨ Enhancement 🧪 Tests 🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Implement direct RPC mode for RPCSmartRouter, enabling standalone routing to RPC nodes without
  blockchain state tracking for lower-latency relay operations
• Add direct RPC connection handling with support for HTTP, WebSocket, gRPC, and REST protocols
  across EVM and Tendermint chains
• Implement session management using composition pattern with DirectRPCConnection interfaces for
  endpoint tracking and block synchronization
• Add WebSocket subscription management with multi-client deduplication, unique router IDs, and
  upstream connection pooling with backoff
• Implement gRPC streaming subscription management with dynamic message handling, connection
  pooling, and reflection-based proxy support
• Add per-endpoint ChainTracker manager for continuous block height polling across multiple direct
  RPC endpoints
• Implement consistency validation with configurable block lag thresholds for pre-request endpoint
  health checking
• Add cache integration with read/write support and stateful relay bypass
• Implement batch request handling using original JSON bytes instead of chainMessage serialization
• Add IP forwarding from client requests to upstream nodes with error mapping from node errors to
  protocol-compatible responses
• Refactor configuration keys from static-providers/backup-providers to
  direct-rpc/backup-direct-rpc with backward compatibility
• Add static provider validation before chain tracker setup with multi-URL group validation
• Fix REST URL path joining to append absolute paths instead of replacing base path (preserves
  gateway prefixes)
• Fix PANIC when nodeError leads to availabilityDegrader
• Reduce debug logging noise from chain tracker polls by filtering internal block polling operations
• Add comprehensive unit tests for gRPC subscription manager, WebSocket config, error mapper, batch
  requests, and session management
• Add REST and direct RPC integration tests with mock RPC and REST servers for local development
• Add smart router initialization scripts for Ethereum, Lava, gRPC, and Tendermint RPC endpoints
Diagram
flowchart LR
  Client["Client Request"]
  Router["RPCSmartRouter<br/>Direct RPC Mode"]
  ChainTracker["EndpointChainTrackerManager<br/>Block Height Polling"]
  Consistency["Consistency Validation<br/>Pre-Request Check"]
  WSMgr["DirectWSSubscriptionManager<br/>Multi-Client Dedup"]
  GRPCMgr["DirectGRPCSubscriptionManager<br/>Stream Pooling"]
  Endpoints["Direct RPC Endpoints<br/>HTTP/WS/gRPC/REST"]
  Cache["Cache Layer<br/>Read/Write"]
  
  Client --> Router
  Router --> ChainTracker
  Router --> Consistency
  Router --> WSMgr
  Router --> GRPCMgr
  Router --> Cache
  Consistency --> Endpoints
  WSMgr --> Endpoints
  GRPCMgr --> Endpoints
  Cache --> Endpoints
Loading

Grey Divider

File Changes

1. protocol/rpcsmartrouter/rpcsmartrouter_server.go ✨ Enhancement +1018/-887

Refactor smart router from provider-relay to direct RPC mode

• Refactored from provider-relay mode to direct RPC mode: removed Lava protocol
 signing/verification, replaced with direct endpoint connections
• Removed subscription context management (cleanup goroutines, CancelableContextHolder) - delegated
 to subscription managers
• Added per-endpoint ChainTracker manager for continuous block polling and consistency
 pre-validation
• Implemented sendRelayToDirectEndpoints() for parallel direct RPC relay with consistency
 filtering and health tracking
• Added cache write support (tryCacheWrite()) for successful direct RPC responses with proper
 finalization logic
• Replaced relayInner() with relayInnerDirect() for direct RPC protocol handling
 (HTTP/WebSocket/gRPC)
• Added gRPC reflection support via GetGRPCReflectionConnection() for tools like grpcurl

protocol/rpcsmartrouter/rpcsmartrouter_server.go


2. protocol/lavasession/consumer_session_manager.go ✨ Enhancement +131/-6

Add direct RPC endpoint discovery and health checking

• Added EndpointWithDirectConnection struct to hold endpoint and direct RPC connection pairs
• Added GetAllDirectRPCEndpoints() method to retrieve all direct RPC endpoints for ChainTracker
 initialization
• Added probeDirectRPCEndpoints() method for health checking direct RPC endpoints without gRPC
 clients
• Updated GetSessions() to pass NetworkAddress parameter for endpoint tracking
• Fixed OnSessionFailure() to only block EndpointConnection for provider-relay sessions (direct
 RPC sessions don't have it)
• Removed emoji prefixes from log messages (✅, 🔍, 🚨, 🔄) for cleaner output

protocol/lavasession/consumer_session_manager.go


3. protocol/metrics/consumer_metrics_manager_test.go Formatting +5/-5

Standardize test output formatting

• Replaced emoji prefixes (✓) with [PASS] text prefix in test output messages for better
 compatibility

protocol/metrics/consumer_metrics_manager_test.go


View more (109)
4. protocol/lavasession/errors.go ✨ Enhancement +1/-0

Add consistency pre-validation error code

• Added new error code ConsistencyPreValidationError (699) for endpoints that fail pre-request
 consistency validation

protocol/lavasession/errors.go


5. protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go 🧪 Tests +1583/-0

WebSocket subscription manager multi-client deduplication tests

• Comprehensive test suite for WebSocket subscription management with 1500+ lines covering
 multi-client deduplication, subscription ID mapping, and protocol-specific handling
• Tests for unique router ID generation per client, preventing one client's unsubscribe from
 affecting others sharing the same subscription
• Tendermint and EVM protocol-specific tests validating subscription response formats and ID
 rewriting behavior
• Integration tests for multi-client join, independent unsubscribe, message routing, and
 reconnection scenarios

protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go


6. protocol/rpcsmartrouter/rpcsmartrouter_server_test.go 🧪 Tests +433/-554

Refactor subscription tests and add consistency validation tests

• Removed 460+ lines of subscription-related tests (getFirstSubscriptionReply,
 mockRelaySubscribeClient) that are now covered by dedicated WebSocket subscription manager tests
• Added EndpointChainTrackerManager lifecycle tests validating per-tracker context cancellation
 and cleanup on removal
• Added filterEndpointsByConsistency tests for consistency pre-validation with failed session
 tracking and retry logic
• Enhanced MockProtocolMessage with configurable requestedBlock and userData fields for more
 flexible testing

protocol/rpcsmartrouter/rpcsmartrouter_server_test.go


7. protocol/rpcsmartrouter/endpoint_chain_tracker_test.go 🧪 Tests +342/-0

Endpoint chain tracker manager lifecycle and concurrency tests

• New test file with 340+ lines covering EndpointChainTrackerManager lifecycle and thread-safety
• Tests for tracker creation, removal, context cancellation, and concurrent operations
• Validates that RemoveTracker properly invokes cancel functions and Stop() cancels all tracker
 contexts
• Mock implementation of DirectRPCConnection for testing tracker behavior

protocol/rpcsmartrouter/endpoint_chain_tracker_test.go


8. protocol/rpcsmartrouter/direct_ws_subscription_manager.go ✨ Enhancement +1620/-0

Direct WebSocket subscription manager with deduplication and reconnection

• Implements DirectWSSubscriptionManager for managing WebSocket subscriptions directly to upstream
 RPC endpoints without provider routing
• Supports multi-client subscription deduplication with unique router IDs per client and shared
 upstream subscriptions
• Includes session management with sticky sessions for client affinity, rate limiting per client,
 and global subscription limits
• Handles upstream reconnection with backoff, subscription restoration, and per-endpoint
 ChainTracker integration

protocol/rpcsmartrouter/direct_ws_subscription_manager.go


9. protocol/rpcsmartrouter/rpcsmartrouter.go ✨ Enhancement +408/-87

Smart router refactored for direct RPC mode with chain tracking

• Refactors smart router to use direct RPC connections instead of provider-based routing
• Adds DirectWSSubscriptionManager and DirectGRPCSubscriptionManager for direct endpoint
 subscriptions
• Implements static provider validation before chain tracker setup with multi-URL group validation
• Adds per-endpoint ChainTracker manager for continuous block height polling and epoch-based cleanup
 of stale trackers
• Updates configuration keys from static-providers/backup-providers to
 direct-rpc/backup-direct-rpc with backward compatibility

protocol/rpcsmartrouter/rpcsmartrouter.go


10. protocol/rpcsmartrouter/upstream_grpc_pool.go ✨ Enhancement +533/-0

gRPC streaming connection pool with auto-scaling and reflection

• Implements UpstreamGRPCStreamConnection and UpstreamGRPCPool for managing gRPC streaming
 connections
• Supports dynamic connection pooling with auto-scaling between min/max connections based on stream
 load
• Includes method descriptor caching via gRPC reflection and reconnection with exponential backoff
• Provides stream count tracking and health monitoring for gRPC endpoints

protocol/rpcsmartrouter/upstream_grpc_pool.go


11. protocol/chainlib/base_chain_parser.go 🐞 Bug fix +14/-7

Reduce debug logging noise from chain tracker polls

• Adds conditional debug logging in ExtensionParsing to reduce noise from internal chain tracker
 polls
• Only emits archive debug traces for user relays (LatestBlock > 0), filtering out chain tracker's
 internal block polling

protocol/chainlib/base_chain_parser.go


12. AGENTS.md Additional files +24/-0

...

AGENTS.md


13. BRANCH_CHANGES_SLIDE.md Additional files +163/-0

...

BRANCH_CHANGES_SLIDE.md


14. PROVIDER_SELECTION_HEADER.md Additional files +142/-0

...

PROVIDER_SELECTION_HEADER.md


15. RPCSMARTROUTER_TESTING_CHECKLIST.md Additional files +98/-0

...

RPCSMARTROUTER_TESTING_CHECKLIST.md


16. WEIGHTED_SELECTOR_TESTING_PLAN.md Additional files +761/-0

...

WEIGHTED_SELECTOR_TESTING_PLAN.md


17. WEIGHTED_SELECTOR_TEST_RESULTS.md Additional files +330/-0

...

WEIGHTED_SELECTOR_TEST_RESULTS.md


18. config/consumer_examples/lava_consumer_static_peers.yml Additional files +1/-1

...

config/consumer_examples/lava_consumer_static_peers.yml


19. config/consumer_examples/lava_consumer_static_with_backup.yml Additional files +2/-2

...

config/consumer_examples/lava_consumer_static_with_backup.yml


20. config/consumer_examples/lava_consumer_static_with_backup_base.yml Additional files +53/-0

...

config/consumer_examples/lava_consumer_static_with_backup_base.yml


21. config/consumer_examples/lava_consumer_static_with_backup_bch.yml Additional files +49/-0

...

config/consumer_examples/lava_consumer_static_with_backup_bch.yml


22. config/consumer_examples/lava_consumer_static_with_backup_eth.yml Additional files +2/-2

...

config/consumer_examples/lava_consumer_static_with_backup_eth.yml


23. config/provider1_avaxc.yml Additional files +8/-0

...

config/provider1_avaxc.yml


24. metrics.json Additional files +13425/-0

...

metrics.json


25. protocol/chainlib/chainlib.go Additional files +15/-3

...

protocol/chainlib/chainlib.go


26. protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go Additional files +46/-3

...

protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go


27. protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage_test.go Additional files +90/-0

...

protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage_test.go


28. protocol/chainlib/chainproxy/rpcclient/json.go Additional files +6/-1

...

protocol/chainlib/chainproxy/rpcclient/json.go


29. protocol/chainlib/chainproxy/rpcclient/subscription_test.go Additional files +4/-4

...

protocol/chainlib/chainproxy/rpcclient/subscription_test.go


30. protocol/chainlib/consumer_websocket_manager.go Additional files +51/-36

...

protocol/chainlib/consumer_websocket_manager.go


31. protocol/chainlib/consumer_ws_subscription_manager.go Additional files +4/-3

...

protocol/chainlib/consumer_ws_subscription_manager.go


32. protocol/chainlib/consumer_ws_subscription_manager_test.go Additional files +2/-2

...

protocol/chainlib/consumer_ws_subscription_manager_test.go


33. protocol/chainlib/grpc.go Additional files +10/-1

...

protocol/chainlib/grpc.go


34. protocol/chainlib/grpcproxy/dyncodec/file_registry_test.go Additional files +223/-0

...

protocol/chainlib/grpcproxy/dyncodec/file_registry_test.go


35. protocol/chainlib/grpcproxy/dyncodec/hybrid_registry.go Additional files +304/-0

...

protocol/chainlib/grpcproxy/dyncodec/hybrid_registry.go


36. protocol/chainlib/grpcproxy/dyncodec/remote_file.go Additional files +268/-0

...

protocol/chainlib/grpcproxy/dyncodec/remote_file.go


37. protocol/chainlib/grpcproxy/grpc_reflection_proxy.go Additional files +124/-0

...

protocol/chainlib/grpcproxy/grpc_reflection_proxy.go


38. protocol/chainlib/grpcproxy/grpcproxy.go Additional files +40/-0

...

protocol/chainlib/grpcproxy/grpcproxy.go


39. protocol/chainlib/jsonRPC.go Additional files +30/-25

...

protocol/chainlib/jsonRPC.go


40. protocol/chainlib/referer_stub.go Additional files +8/-0

...

protocol/chainlib/referer_stub.go


41. protocol/chainlib/tendermintRPC.go Additional files +31/-26

...

protocol/chainlib/tendermintRPC.go


42. protocol/chainlib/ws_subscription_manager.go Additional files +60/-0

...

protocol/chainlib/ws_subscription_manager.go


43. protocol/common/conf.go Additional files +4/-2

...

protocol/common/conf.go


44. protocol/common/endpoints.go Additional files +101/-0

...

protocol/common/endpoints.go


45. protocol/lavasession/consumer_types.go Additional files +178/-11

...

protocol/lavasession/consumer_types.go


46. protocol/lavasession/direct_rpc_connection.go Additional files +866/-0

...

protocol/lavasession/direct_rpc_connection.go


47. protocol/lavasession/direct_rpc_connection_test.go Additional files +326/-0

...

protocol/lavasession/direct_rpc_connection_test.go


48. protocol/lavasession/direct_rpc_session_selection_test.go Additional files +184/-0

...

protocol/lavasession/direct_rpc_session_selection_test.go


49. protocol/lavasession/session_connection_test.go Additional files +445/-0

...

protocol/lavasession/session_connection_test.go


50. protocol/lavasession/single_consumer_session.go Additional files +59/-2

...

protocol/lavasession/single_consumer_session.go


51. protocol/relaycore/consistency_config.go Additional files +92/-0

...

protocol/relaycore/consistency_config.go


52. protocol/relaycore/consistency_validation.go Additional files +120/-0

...

protocol/relaycore/consistency_validation.go


53. protocol/relaycore/consistency_validation_test.go Additional files +292/-0

...

protocol/relaycore/consistency_validation_test.go


54. protocol/relaycore/relay_processor.go Additional files +122/-36

...

protocol/relaycore/relay_processor.go


55. protocol/relaycore/relay_processor_memory_test.go Additional files +4/-4

...

protocol/relaycore/relay_processor_memory_test.go


56. protocol/relaycore/relay_processor_test.go Additional files +1/-0

...

protocol/relaycore/relay_processor_test.go


57. protocol/rpcconsumer/rpcconsumer_server.go Additional files +1/-1

...

protocol/rpcconsumer/rpcconsumer_server.go


58. protocol/rpcprovider/rpcprovider.go Additional files +61/-245

...

protocol/rpcprovider/rpcprovider.go


59. protocol/rpcprovider/rpcprovider_server.go Additional files +15/-40

...

protocol/rpcprovider/rpcprovider_server.go


60. protocol/rpcprovider/rpcprovider_server_test.go Additional files +7/-124

...

protocol/rpcprovider/rpcprovider_server_test.go


61. protocol/rpcprovider/static_provider_parsing_test.go Additional files +0/-345

...

protocol/rpcprovider/static_provider_parsing_test.go


62. protocol/rpcprovider/static_provider_validation_test.go Additional files +0/-189

...

protocol/rpcprovider/static_provider_validation_test.go


63. protocol/rpcprovider/testing.go Additional files +6/-6

...

protocol/rpcprovider/testing.go


64. protocol/rpcsmartrouter/direct_grpc_subscription_manager.go Additional files +956/-0

...

protocol/rpcsmartrouter/direct_grpc_subscription_manager.go


65. protocol/rpcsmartrouter/direct_grpc_subscription_manager_test.go Additional files +405/-0

...

protocol/rpcsmartrouter/direct_grpc_subscription_manager_test.go


66. protocol/rpcsmartrouter/direct_rpc_integration_test.go Additional files +516/-0

...

protocol/rpcsmartrouter/direct_rpc_integration_test.go


67. protocol/rpcsmartrouter/direct_rpc_relay.go Additional files +679/-0

...

protocol/rpcsmartrouter/direct_rpc_relay.go


68. protocol/rpcsmartrouter/endpoint_chain_fetcher.go Additional files +275/-0

...

protocol/rpcsmartrouter/endpoint_chain_fetcher.go


69. protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go Additional files +351/-0

...

protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go


70. protocol/rpcsmartrouter/error_mapper.go Additional files +85/-0

...

protocol/rpcsmartrouter/error_mapper.go


71. protocol/rpcsmartrouter/error_mapper_test.go Additional files +199/-0

...

protocol/rpcsmartrouter/error_mapper_test.go


72. protocol/rpcsmartrouter/grpc_streaming_config.go Additional files +144/-0

...

protocol/rpcsmartrouter/grpc_streaming_config.go


73. protocol/rpcsmartrouter/grpc_streaming_config_test.go Additional files +220/-0

...

protocol/rpcsmartrouter/grpc_streaming_config_test.go


74. protocol/rpcsmartrouter/minimal_state_tracker_mock.go Additional files +0/-49

...

protocol/rpcsmartrouter/minimal_state_tracker_mock.go


75. protocol/rpcsmartrouter/noop_ws_subscription_manager.go Additional files +82/-0

...

protocol/rpcsmartrouter/noop_ws_subscription_manager.go


76. protocol/rpcsmartrouter/noop_ws_subscription_manager_test.go Additional files +132/-0

...

protocol/rpcsmartrouter/noop_ws_subscription_manager_test.go


77. protocol/rpcsmartrouter/rest_integration_test.go Additional files +355/-0

...

protocol/rpcsmartrouter/rest_integration_test.go


78. protocol/rpcsmartrouter/rpcsmartrouter_compression_test.go Additional files +0/-200

...

protocol/rpcsmartrouter/rpcsmartrouter_compression_test.go


79. protocol/rpcsmartrouter/smartrouter_relay_state_machine.go Additional files +11/-23

...

protocol/rpcsmartrouter/smartrouter_relay_state_machine.go


80. protocol/rpcsmartrouter/smartrouter_relay_state_machine_test.go Additional files +10/-10

...

protocol/rpcsmartrouter/smartrouter_relay_state_machine_test.go


81. protocol/rpcsmartrouter/subscription_id_mapper.go Additional files +189/-0

...

protocol/rpcsmartrouter/subscription_id_mapper.go


82. protocol/rpcsmartrouter/subscription_id_mapper_test.go Additional files +270/-0

...

protocol/rpcsmartrouter/subscription_id_mapper_test.go


83. protocol/rpcsmartrouter/upstream_grpc_pool_test.go Additional files +408/-0

...

protocol/rpcsmartrouter/upstream_grpc_pool_test.go


84. protocol/rpcsmartrouter/upstream_ws_pool.go Additional files +543/-0

...

protocol/rpcsmartrouter/upstream_ws_pool.go


85. protocol/rpcsmartrouter/websocket_backoff.go Additional files +126/-0

...

protocol/rpcsmartrouter/websocket_backoff.go


86. protocol/rpcsmartrouter/websocket_backoff_test.go Additional files +228/-0

...

protocol/rpcsmartrouter/websocket_backoff_test.go


87. protocol/rpcsmartrouter/websocket_config.go Additional files +148/-0

...

protocol/rpcsmartrouter/websocket_config.go


88. protocol/rpcsmartrouter/websocket_config_test.go Additional files +204/-0

...

protocol/rpcsmartrouter/websocket_config_test.go


89. scripts/EXAMPLE_USAGE.md Additional files +166/-0

...

scripts/EXAMPLE_USAGE.md


90. scripts/README_extract_qos.md Additional files +142/-0

...

scripts/README_extract_qos.md


91. scripts/README_parse_consumer_log.md Additional files +174/-0

...

scripts/README_parse_consumer_log.md


92. scripts/analyze_qos_output.py Additional files +211/-0

...

scripts/analyze_qos_output.py


93. scripts/extract_qos_from_consumer_log.py Additional files +282/-0

...

scripts/extract_qos_from_consumer_log.py


94. scripts/mock_rest_server/README.md Additional files +132/-0

...

scripts/mock_rest_server/README.md


95. scripts/mock_rest_server/main.go Additional files +142/-0

...

scripts/mock_rest_server/main.go


96. scripts/mock_rpc_server/README.md Additional files +130/-0

...

scripts/mock_rpc_server/README.md


97. scripts/mock_rpc_server/init_mock_server.go Additional files +186/-0

...

scripts/mock_rpc_server/init_mock_server.go


98. scripts/monitoring/README.md Additional files +149/-0

...

scripts/monitoring/README.md


99. scripts/parse_consumer_log.py Additional files +414/-0

...

scripts/parse_consumer_log.py


100. scripts/pre_setups/PPROF_GUIDE.md Additional files +204/-0

...

scripts/pre_setups/PPROF_GUIDE.md


101. scripts/pre_setups/Untitled Additional files +1/-0

...

scripts/pre_setups/Untitled


102. scripts/pre_setups/init_avax_c_only_with_node.sh Additional files +85/-0

...

scripts/pre_setups/init_avax_c_only_with_node.sh


103. scripts/pre_setups/init_bch_only_with_node.sh Additional files +59/-0

...

scripts/pre_setups/init_bch_only_with_node.sh


104. scripts/pre_setups/init_lava_only_with_node_three_providers_slow_test.sh Additional files +132/-0

...

scripts/pre_setups/init_lava_only_with_node_three_providers_slow_test.sh


105. scripts/pre_setups/init_lava_only_with_node_three_providers_with_cache.sh Additional files +102/-0

...

scripts/pre_setups/init_lava_only_with_node_three_providers_with_cache.sh


106. scripts/pre_setups/init_lava_smartrouter_eth.sh Additional files +407/-0

...

scripts/pre_setups/init_lava_smartrouter_eth.sh


107. scripts/pre_setups/init_lava_smartrouter_lava.sh Additional files +307/-0

...

scripts/pre_setups/init_lava_smartrouter_lava.sh


108. scripts/pre_setups/init_lava_static_and_backup_provider.sh Additional files +2/-2

...

scripts/pre_setups/init_lava_static_and_backup_provider.sh


109. scripts/pre_setups/init_lava_static_and_backup_provider_bch.sh Additional files +232/-0

...

scripts/pre_setups/init_lava_static_and_backup_provider_bch.sh


110. scripts/pre_setups/init_lava_static_and_backup_provider_eth.sh Additional files +0/-3

...

scripts/pre_setups/init_lava_static_and_backup_provider_eth.sh


111. scripts/pre_setups/init_lava_static_provider_with_kafka.sh Additional files +2/-2

...

scripts/pre_setups/init_lava_static_provider_with_kafka.sh


112. Additional files not shown Additional files +0/-0

...

Additional files not shown


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Feb 23, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (3) 📘 Rule violations (3) 📎 Requirement gaps (0)

Grey Divider


Action required

1. ETH_RPC_URL_2 hardcoded API key 📘 Rule violation ⛨ Security
Description
The init script commits a real-looking API key inside a default RPC URL, which risks credential
leakage and unauthorized use. Secrets must not be stored in the repo; they should be supplied via
environment variables or placeholders.
Code

scripts/pre_setups/init_lava_smartrouter_eth.sh[84]

+export ETH_RPC_URL_2="${ETH_RPC_URL_2:-https://json-rpc.8zfcse2amst1lajmh299uq4jn.blockchainnodeengine.com/?key=AIzaSyDyUtm6b-e-xKDQgVWzlroHdVTytiXEDik}"
Evidence
Compliance requires that secrets are not committed and instead provided via environment variables;
the script hardcodes an API key directly in the repository-tracked file.

AGENTS.md
scripts/pre_setups/init_lava_smartrouter_eth.sh[83-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A real-looking API key (and gateway token-like values) are committed as default RPC endpoint values in a tracked script.

## Issue Context
Compliance requires that secrets are not committed and are provided via environment variables or templates.

## Fix Focus Areas
- scripts/pre_setups/init_lava_smartrouter_eth.sh[82-96]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. err.Error() logged unredacted 📘 Rule violation ⛨ Security
Description
The direct RPC relay logs the raw error string, which may include full upstream URLs and embedded
API keys/tokens. This can leak secrets into logs and violates secure logging requirements.
Code

protocol/rpcsmartrouter/direct_rpc_relay.go[R299-305]

+		return nil, MapDirectRPCError(err, d.directConnection.GetProtocol())
+	}
+
+	statusCode := response.StatusCode
+	responseData := response.Body
+
+	// Handle HTTP error status codes
Evidence
Secure logging requires that no sensitive data (including API keys/tokens) appears in logs; logging
err.Error() can include request URLs, and this PR introduces configs/scripts with embedded
key-like URL segments.

Rule 5: Generic: Secure Logging Practices
protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]
scripts/pre_setups/init_lava_smartrouter_eth.sh[83-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Raw `err.Error()` is logged and may contain full upstream URLs and embedded secrets.

## Issue Context
Even debug logs must not contain secrets; direct RPC endpoints commonly embed tokens in URL paths or query params.

## Fix Focus Areas
- protocol/rpcsmartrouter/direct_rpc_relay.go[292-306]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. MapDirectRPCError exposes internal errors 📘 Rule violation ⛨ Security
Description
The direct-RPC error mapping wraps and returns the underlying error (%w), which can propagate
internal network/endpoint details to clients. User-facing errors should be generic, with detailed
causes kept only in internal logs.
Code

protocol/rpcsmartrouter/error_mapper.go[R20-32]

+	if isConnectionRefused(err) {
+		return fmt.Errorf("RPC endpoint unavailable (connection refused): %w", err)
+	}
+
+	if isTimeout(err) {
+		return fmt.Errorf("RPC request timeout: %w", err)
+	}
+
+	// Protocol-specific error handling
+	switch protocol {
+	case lavasession.DirectRPCProtocolHTTP, lavasession.DirectRPCProtocolHTTPS:
+		return mapHTTPError(err)
+	case lavasession.DirectRPCProtocolGRPC:
Evidence
Secure error handling requires not exposing internal implementation details to end users; wrapping
the underlying error directly in returned messages risks leaking endpoint addresses and other
internals.

Rule 4: Generic: Secure Error Handling
protocol/rpcsmartrouter/error_mapper.go[19-32]
protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`MapDirectRPCError` returns errors that include underlying internal error details.

## Issue Context
User-facing errors should not expose internal details (endpoint addresses, low-level network errors). Details should be logged internally.

## Fix Focus Areas
- protocol/rpcsmartrouter/error_mapper.go[19-45]
- protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. Direct RPC probe always fails 🐞 Bug ✓ Correctness
Description
Direct-RPC endpoints can be reported as connected while producing EndpointAndChosenConnection
entries with nil chosenEndpointConnection, so probeProvider won’t take the direct-RPC probe path and
instead errors with “returned nil client in endpoint”. This can break periodic health
probing/optimizer inputs and can lead to endpoints being treated as unhealthy even when direct
connections are healthy.
Code

protocol/lavasession/consumer_session_manager.go[R457-466]

+	// Check if this is a direct RPC endpoint (smart router mode)
+	// Direct RPC endpoints return empty endpoints list but connected=true
+	// We need to probe them differently - using the DirectRPCConnection health check
+	if len(endpoints) == 0 && connected {
+		return csm.probeDirectRPCEndpoints(ctx, consumerSessionsWithProvider, providerAddress)
+	}
+
	var endpointInfos []EndpointInfo
	lastError := fmt.Errorf("endpoints list is empty") // this error will happen if we had 0 endpoints
	for _, endpointAndConnection := range endpoints {
Evidence
probeProvider assumes direct-RPC mode yields an empty endpoints slice, but the endpoint selection
code appends entries even when connectEndpoint returns (nil, true) for direct RPC. As a result,
len(endpoints) != 0, so probeProvider skips probeDirectRPCEndpoints and hits the nil-client error
guard, causing probe failures.

protocol/lavasession/consumer_session_manager.go[447-482]
protocol/lavasession/consumer_types.go[676-694]
protocol/lavasession/consumer_types.go[806-816]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`probeProvider` assumes direct-RPC mode produces an empty `endpoints` slice, but direct-RPC selection returns `connected=true` with `chosenEndpointConnection=nil`. This causes probes to fail with `returned nil client in endpoint` instead of using `probeDirectRPCEndpoints`.

### Issue Context
Direct-RPC `connectEndpoint` returns `(nil, true)` when the pre-established direct connection is healthy. The calling code still appends this as an endpoint entry, so `len(endpoints) != 0` and the special-case `len(endpoints)==0` direct-RPC probe logic never runs.

### Fix Focus Areas
- protocol/lavasession/consumer_session_manager.go[447-482]
- protocol/lavasession/consumer_types.go[676-694]
- protocol/lavasession/consumer_types.go[806-816]

### Suggested changes
- In `probeProvider`, detect direct-RPC by checking whether **all** returned entries have `chosenEndpointConnection == nil` (or whether `endpoint.IsDirectRPC()`), and then call `probeDirectRPCEndpoints`.
- Alternatively, in `fetchEndpointConnectionFromConsumerSessionWithProvider`, for direct-RPC endpoints do not append `EndpointAndChosenConnection` entries with nil `chosenEndpointConnection` (but ensure session selection still has enough information to pick endpoints—e.g., store endpoints separately or add an explicit flag).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Endpoint health data races 🐞 Bug ⛯ Reliability
Description
Endpoint health tracking mutates ConnectionRefusals/Enabled and writes LastBlockUpdate without
synchronization, despite documenting that Endpoint.mu protects these fields. These are
called/updated from concurrent request goroutines and tracker callbacks, risking undefined behavior
and incorrect routing decisions under load.
Code

protocol/lavasession/consumer_types.go[R234-255]

+// MarkUnhealthy increments connection refusals and disables endpoint if threshold exceeded
+func (e *Endpoint) MarkUnhealthy() {
+	e.ConnectionRefusals++
+	if e.ConnectionRefusals >= MaxConsecutiveConnectionAttempts {
+		e.Enabled = false
+		utils.LavaFormatWarning("disabled unhealthy endpoint", nil,
+			utils.LogAttr("endpoint", e.NetworkAddress),
+			utils.LogAttr("refusals", e.ConnectionRefusals),
+			utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
+		)
+	}
+}
+
+// ResetHealth resets connection refusals and re-enables endpoint
+func (e *Endpoint) ResetHealth() {
+	e.ConnectionRefusals = 0
+	e.Enabled = true
+	utils.LavaFormatInfo("re-enabled healthy endpoint",
+		utils.LogAttr("endpoint", e.NetworkAddress),
+		utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
+	)
+}
Evidence
Endpoint.mu is explicitly documented as protecting ConnectionRefusals and Enabled, but
MarkUnhealthy/ResetHealth update them without locking/atomic operations. rpcsmartrouter_server
invokes these methods on request paths concurrently. Separately, LastBlockUpdate (time.Time) is
written from callbacks without synchronization while also being written elsewhere, which is a Go
data race.

protocol/lavasession/consumer_types.go[188-255]
protocol/rpcsmartrouter/rpcsmartrouter_server.go[1766-1801]
protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go[160-172]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`Endpoint.MarkUnhealthy` / `ResetHealth` and block tracking (`LastBlockUpdate`) perform unsynchronized reads/writes to shared fields that are used concurrently by smart-router request handlers and tracker callbacks, creating Go data races.

### Issue Context
- `Endpoint.mu` is documented as protecting `ConnectionRefusals` and `Enabled`, but these methods mutate without locking.
- `LastBlockUpdate` is a `time.Time` written from multiple goroutines without any lock/atomic.

### Fix Focus Areas
- protocol/lavasession/consumer_types.go[188-255]
- protocol/rpcsmartrouter/rpcsmartrouter_server.go[1766-1801]
- protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go[160-172]

### Suggested changes
- Wrap `MarkUnhealthy` and `ResetHealth` bodies with `e.mu.Lock()/Unlock()` (or switch `ConnectionRefusals` to `atomic.Uint64` and `Enabled` to `atomic.Bool`).
- For `LastBlockUpdate`, either:
 - Guard all reads/writes with `e.mu`, or
 - Replace with `atomic.Int64` holding `time.Now().UnixNano()`.
- Update call sites that read `ConnectionRefusals`/`Enabled`/`LastBlockUpdate` to use the same locking/atomic approach (e.g., `if targetEndpoint != nil &amp;&amp; targetEndpoint.ConnectionRefusals &gt; 0` should not read without synchronization if you keep mutex-based protection).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

6. Quorum logs too verbose 🐞 Bug ➹ Performance
Description
responsesCrossValidation now emits multiple Info-level logs per quorum check with high-cardinality
fields (GUID, providers, hashes). In cross-validation mode this can significantly increase log
volume and CPU/IO overhead.
Code

protocol/relaycore/relay_processor.go[R521-528]

+	// Log quorum validation start
+	utils.LavaFormatInfo("🔍 [Quorum Validation] Starting consensus check",
+		utils.LogAttr("GUID", rp.guid),
+		utils.LogAttr("totalResults", len(results)),
+		utils.LogAttr("requiredQuorumSize", crossValidationSize),
+		utils.LogAttr("agreementThreshold", rp.getAgreementThreshold()),
+		utils.LogAttr("maxParticipants", rp.getMaxParticipants()),
+	)
Evidence
responsesCrossValidation is part of relay processing in cross-validation/quorum selection. The new
Info logs will execute for each quorum check, and some additional Info logs also appear deeper in
the function when quorum fails/succeeds.

protocol/relaycore/relay_processor.go[516-528]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
Info-level quorum logs in `responsesCrossValidation` are on a hot path and include high-cardinality attributes, which can flood logs and add overhead in production.

### Issue Context
This runs whenever cross-validation/quorum is enabled and a quorum evaluation occurs.

### Fix Focus Areas
- protocol/relaycore/relay_processor.go[516-528]

### Suggested changes
- Change `LavaFormatInfo` to `LavaFormatDebug` (or `Trace`) for per-request quorum logs.
- Optionally gate behind `rp.debugRelay` or a dedicated configuration flag.
- Consider sampling if Info-level visibility is required.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@github-actions

github-actions Bot commented Feb 23, 2026

Copy link
Copy Markdown

Test Results

0 tests  ±0   0 ✅ ±0   0s ⏱️ ±0s
0 suites ±0   0 💤 ±0 
7 files   ±0   0 ❌ ±0 

Results for commit 85b48d8. ± Comparison against base commit ffece61.

♻️ This comment has been updated with latest results.

Comment thread scripts/pre_setups/init_lava_smartrouter_eth.sh
Comment on lines +299 to +305
return nil, MapDirectRPCError(err, d.directConnection.GetProtocol())
}

statusCode := response.StatusCode
responseData := response.Body

// Handle HTTP error status codes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

2. err.error() logged unredacted 📘 Rule violation ⛨ Security

The direct RPC relay logs the raw error string, which may include full upstream URLs and embedded
API keys/tokens. This can leak secrets into logs and violates secure logging requirements.
Agent Prompt
## Issue description
Raw `err.Error()` is logged and may contain full upstream URLs and embedded secrets.

## Issue Context
Even debug logs must not contain secrets; direct RPC endpoints commonly embed tokens in URL paths or query params.

## Fix Focus Areas
- protocol/rpcsmartrouter/direct_rpc_relay.go[292-306]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment on lines +20 to +32
if isConnectionRefused(err) {
return fmt.Errorf("RPC endpoint unavailable (connection refused): %w", err)
}

if isTimeout(err) {
return fmt.Errorf("RPC request timeout: %w", err)
}

// Protocol-specific error handling
switch protocol {
case lavasession.DirectRPCProtocolHTTP, lavasession.DirectRPCProtocolHTTPS:
return mapHTTPError(err)
case lavasession.DirectRPCProtocolGRPC:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

3. mapdirectrpcerror exposes internal errors 📘 Rule violation ⛨ Security

The direct-RPC error mapping wraps and returns the underlying error (%w), which can propagate
internal network/endpoint details to clients. User-facing errors should be generic, with detailed
causes kept only in internal logs.
Agent Prompt
## Issue description
`MapDirectRPCError` returns errors that include underlying internal error details.

## Issue Context
User-facing errors should not expose internal details (endpoint addresses, low-level network errors). Details should be logged internally.

## Fix Focus Areas
- protocol/rpcsmartrouter/error_mapper.go[19-45]
- protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Comment thread protocol/lavasession/consumer_session_manager.go Outdated
Comment on lines +234 to +255
// MarkUnhealthy increments connection refusals and disables endpoint if threshold exceeded
func (e *Endpoint) MarkUnhealthy() {
e.ConnectionRefusals++
if e.ConnectionRefusals >= MaxConsecutiveConnectionAttempts {
e.Enabled = false
utils.LavaFormatWarning("disabled unhealthy endpoint", nil,
utils.LogAttr("endpoint", e.NetworkAddress),
utils.LogAttr("refusals", e.ConnectionRefusals),
utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
)
}
}

// ResetHealth resets connection refusals and re-enables endpoint
func (e *Endpoint) ResetHealth() {
e.ConnectionRefusals = 0
e.Enabled = true
utils.LavaFormatInfo("re-enabled healthy endpoint",
utils.LogAttr("endpoint", e.NetworkAddress),
utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

5. Endpoint health data races 🐞 Bug ⛯ Reliability

Endpoint health tracking mutates ConnectionRefusals/Enabled and writes LastBlockUpdate without
synchronization, despite documenting that Endpoint.mu protects these fields. These are
called/updated from concurrent request goroutines and tracker callbacks, risking undefined behavior
and incorrect routing decisions under load.
Agent Prompt
### Issue description
`Endpoint.MarkUnhealthy` / `ResetHealth` and block tracking (`LastBlockUpdate`) perform unsynchronized reads/writes to shared fields that are used concurrently by smart-router request handlers and tracker callbacks, creating Go data races.

### Issue Context
- `Endpoint.mu` is documented as protecting `ConnectionRefusals` and `Enabled`, but these methods mutate without locking.
- `LastBlockUpdate` is a `time.Time` written from multiple goroutines without any lock/atomic.

### Fix Focus Areas
- protocol/lavasession/consumer_types.go[188-255]
- protocol/rpcsmartrouter/rpcsmartrouter_server.go[1766-1801]
- protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go[160-172]

### Suggested changes
- Wrap `MarkUnhealthy` and `ResetHealth` bodies with `e.mu.Lock()/Unlock()` (or switch `ConnectionRefusals` to `atomic.Uint64` and `Enabled` to `atomic.Bool`).
- For `LastBlockUpdate`, either:
  - Guard all reads/writes with `e.mu`, or
  - Replace with `atomic.Int64` holding `time.Now().UnixNano()`.
- Update call sites that read `ConnectionRefusals`/`Enabled`/`LastBlockUpdate` to use the same locking/atomic approach (e.g., `if targetEndpoint != nil && targetEndpoint.ConnectionRefusals > 0` should not read without synchronization if you keep mutex-based protection).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch 2 times, most recently from b88ca1d to 1fa4b02 Compare February 26, 2026 11:43
@nimrod-teich nimrod-teich force-pushed the feat/direct-rpc-clean branch from 422807d to adb116a Compare March 1, 2026 12:16
@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch from adb116a to 9b54ecb Compare March 2, 2026 09:51
@codecov

codecov Bot commented Mar 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 28.90421% with 2991 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...l/rpcsmartrouter/direct_ws_subscription_manager.go 32.71% 508 Missing and 35 partials ⚠️
protocol/metrics/smartrouter_metrics_manager.go 0.00% 414 Missing ⚠️
...rpcsmartrouter/direct_grpc_subscription_manager.go 10.48% 408 Missing and 2 partials ⚠️
protocol/rpcsmartrouter/rpcsmartrouter.go 0.00% 381 Missing and 1 partial ⚠️
protocol/lavasession/direct_rpc_connection.go 22.09% 265 Missing and 3 partials ⚠️
protocol/rpcsmartrouter/direct_rpc_relay.go 53.60% 159 Missing and 21 partials ⚠️
protocol/rpcsmartrouter/endpoint_chain_fetcher.go 0.00% 153 Missing ⚠️
...col/chainlib/grpcproxy/dyncodec/hybrid_registry.go 28.45% 79 Missing and 9 partials ⚠️
...l/rpcsmartrouter/endpoint_chain_tracker_manager.go 50.30% 80 Missing and 1 partial ⚠️
protocol/lavasession/consumer_session_manager.go 33.33% 73 Missing and 3 partials ⚠️
... and 24 more
Flag Coverage Δ
consensus 8.71% <0.00%> (-0.01%) ⬇️
protocol 33.63% <28.90%> (-1.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
protocol/chainlib/node_error_handler.go 39.22% <100.00%> (ø)
protocol/chaintracker/config.go 40.00% <ø> (ø)
protocol/common/conf.go 0.00% <ø> (ø)
protocol/metrics/rpcconsumer_logs.go 18.47% <100.00%> (+0.44%) ⬆️
protocol/relaycore/consistency_config.go 100.00% <100.00%> (ø)
protocol/relaycore/consistency_validation.go 100.00% <100.00%> (ø)
protocol/rpcconsumer/rpcconsumer_server.go 32.87% <100.00%> (ø)
protocol/rpcsmartrouter/grpc_streaming_config.go 100.00% <100.00%> (ø)
...col/rpcsmartrouter/noop_ws_subscription_manager.go 100.00% <100.00%> (ø)
protocol/rpcsmartrouter/rpcsmartrouter_server.go 13.08% <ø> (-20.65%) ⬇️
... and 40 more

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch 2 times, most recently from f01f20f to ddf9b85 Compare March 8, 2026 10:26
@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch from 7647fc1 to e973ab7 Compare March 9, 2026 12:56
@NadavLevi NadavLevi closed this Mar 9, 2026
@NadavLevi NadavLevi reopened this Mar 9, 2026
@qodo-code-review

Copy link
Copy Markdown

Review Summary by Qodo

Implement direct RPC mode for RPCSmartRouter with endpoint health checking and session management

✨ Enhancement 🧪 Tests

Grey Divider

Walkthroughs

Description
• Implement direct RPC mode for RPCSmartRouter enabling standalone operation without blockchain
  state tracking, supporting HTTP, WebSocket, gRPC, and REST protocols across EVM and Tendermint
  chains
• Add per-endpoint ChainTracker management with continuous block height polling and consistency
  validation via EndpointChainTrackerManager for pre-request health checking
• Implement DirectWSSubscriptionManager for WebSocket subscription management with multi-client
  deduplication, unique router IDs per client, upstream connection pooling, and backoff logic
• Implement DirectGRPCSubscriptionManager for gRPC streaming with dynamic message handling,
  connection pooling, and reflection-based proxy support
• Refactor configuration keys from static-providers/backup-providers to
  direct-rpc/backup-direct-rpc with backward compatibility and comprehensive static provider
  validation
• Add sendRelayToDirectEndpoints() and relayInnerDirect() methods for parallel direct RPC relay
  handling with HTTP status code classification and per-endpoint metrics
• Integrate cache write support for successful direct RPC responses with proper finalization and
  status code validation
• Implement IP forwarding from client requests to upstream nodes via gRPC metadata and REST headers
• Improve REST message error handling with explicit 5xx and 429 status code classification and error
  message extraction
• Add new ConsistencyPreValidationError (699) error code for endpoints failing pre-request
  consistency validation
• Refactor ConsumerWebSocketManager to use generic WSSubscriptionManager interface supporting
  both provider-relay and direct RPC modes
• Add comprehensive unit tests for WebSocket subscription manager, gRPC subscription manager, error
  mapper, batch requests, and session management
• Add integration tests for REST and direct RPC endpoints with mock RPC and REST servers for local
  development
• Remove subscription-related test code and provider-relay specific logic from
  RPCSmartRouterServer
Diagram
flowchart LR
  Client["Client Request"]
  Router["RPCSmartRouter"]
  Validation["Consistency Validation<br/>filterEndpointsByConsistency"]
  ChainTracker["EndpointChainTrackerManager<br/>Per-endpoint Block Polling"]
  DirectRelay["Direct RPC Relay<br/>relayInnerDirect"]
  SubMgr["Subscription Managers<br/>WS/gRPC"]
  Cache["Cache Integration<br/>Read/Write"]
  Response["Response to Client"]
  
  Client --> Router
  Router --> Validation
  Validation --> ChainTracker
  ChainTracker --> DirectRelay
  DirectRelay --> SubMgr
  DirectRelay --> Cache
  SubMgr --> Response
  Cache --> Response
Loading

Grey Divider

File Changes

1. protocol/rpcsmartrouter/rpcsmartrouter_server.go ✨ Enhancement +1104/-906

Direct RPC mode implementation with endpoint health and consistency validation

• Refactored RPCSmartRouterServer to support direct RPC mode with per-endpoint ChainTracker
 management for continuous block polling and consistency validation
• Removed provider-relay specific fields (privKey, requiredResponses, lavaChainID, reporter,
 subscription context management) and replaced with direct RPC infrastructure (chainTracker,
 endpointChainTrackerManager, grpcSubscriptionManager, smartRouterEndpointMetrics)
• Implemented sendRelayToDirectEndpoints() for parallel direct RPC relay handling with pre-request
 consistency filtering, health tracking, and per-endpoint metrics
• Added relayInnerDirect() to handle direct RPC connections with HTTP status code classification,
 endpoint health management, and backoff logic
• Integrated cache write support (tryCacheWrite()) for successful direct RPC responses with proper
 finalization and status code validation
• Added filterEndpointsByConsistency() for pre-request validation using per-endpoint ChainTrackers
 to avoid stale endpoints
• Implemented initializeChainTrackers() and ensureEndpointChainTracker() for lazy and eager
 ChainTracker initialization
• Removed subscription context cleanup logic (now handled by direct subscription managers)
• Updated error messages and logging to use "endpoint" terminology instead of "provider"
• Added IP forwarding support via gRPC metadata injection and REST header extraction

protocol/rpcsmartrouter/rpcsmartrouter_server.go


2. protocol/chainlib/consumer_websocket_manager.go ✨ Enhancement +51/-36

WebSocket subscription manager abstraction for multi-mode support

• Changed consumerWsSubscriptionManager field to generic wsSubscriptionManager implementing
 WSSubscriptionManager interface for both provider-relay and direct RPC modes
• Updated Unsubscribe() call to return responseData for forwarding node responses directly to
 clients
• Added response forwarding for unsubscribe operations and error handling with formatted error
 messages
• Updated struct field names for consistency (ConsumerWsSubscriptionManagerWsSubscriptionManager)
• Added refererMatchString and refererData fields to options struct

protocol/chainlib/consumer_websocket_manager.go


3. protocol/lavasession/errors.go ✨ Enhancement +1/-0

New consistency pre-validation error code

• Added new error code ConsistencyPreValidationError (699) for endpoints that fail pre-request
 consistency validation

protocol/lavasession/errors.go


View more (108)
4. protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go 🧪 Tests +1589/-0

WebSocket subscription manager comprehensive test coverage

• Comprehensive test suite for WebSocket subscription management with 1589 lines covering
 multi-client deduplication, subscription ID mapping, and protocol-specific handling
• Tests for unique router ID generation per client to prevent subscription interference when
 multiple clients share the same upstream subscription
• Tendermint and EVM protocol-specific tests validating correct subscription response formats and
 notification handling
• Integration tests for multi-client join/unsubscribe flows, rate limiting, and connection
 bookkeeping updates

protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go


5. protocol/rpcsmartrouter/rpcsmartrouter_server_test.go 🧪 Tests +431/-724

Refactored tests: removed subscription tests, added consistency validation tests

• Removed 466 lines of subscription-related test code (getFirstSubscriptionReply tests, error path
 cleanup tests, memory leak scenario tests)
• Removed 184 lines of RPC smart router creation and relay handling test utilities
• Added new tests for EndpointChainTrackerManager lifecycle and cancel function invocation
• Added comprehensive tests for filterEndpointsByConsistency method validating consistency
 pre-validation with failed session tracking
• Enhanced MockProtocolMessage with configurable requestedBlock and userData fields for better
 test flexibility

protocol/rpcsmartrouter/rpcsmartrouter_server_test.go


6. protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go ✨ Enhancement +45/-3

REST message error handling improvements for retry logic

• Added explicit handling for 5xx and 429 HTTP status codes as node errors to trigger retry logic
• Implemented extractErrorMessage function to parse error messages from response bodies with
 fallback strategies
• Changed 4xx error handling (except 429) to pass through as client errors instead of treating as
 node errors
• Improved error message extraction with support for common JSON error field names and truncation
 for large responses

protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go


7. protocol/chaintracker/config.go ✨ Enhancement +1/-0

Chain tracker configuration callback for fetch errors

• Added new FetchErrorCallback field to ChainTrackerConfig struct for handling latest-block
 fetch failures
• Enables callback-based error handling when block height polling encounters errors

protocol/chaintracker/config.go


8. protocol/rpcsmartrouter/direct_ws_subscription_manager.go ✨ Enhancement +1619/-0

Direct WebSocket subscription manager with deduplication and pooling

• Implements DirectWSSubscriptionManager for managing WebSocket subscriptions directly to upstream
 RPC endpoints without provider routing
• Supports multi-client subscription deduplication with unique router IDs per client, enabling
 individual unsubscribe operations
• Includes upstream connection pooling, sticky session management for client affinity, and automatic
 reconnection with backoff
• Provides rate limiting (per-client and global), cleanup of stale subscriptions, and proper
 handling of both EVM and Tendermint subscription protocols

protocol/rpcsmartrouter/direct_ws_subscription_manager.go


9. protocol/rpcsmartrouter/rpcsmartrouter.go ✨ Enhancement +550/-97

Smart router refactoring with direct RPC and chain tracker integration

• Refactors configuration keys from static-providers/backup-providers to
 direct-rpc/backup-direct-rpc with backward compatibility
• Adds comprehensive static provider validation (PHASE 1) before chain tracker setup, with non-fatal
 validation for backup providers
• Implements per-endpoint ChainTracker setup (PHASE 2) for continuous block height polling and
 sync verification
• Replaces provider-based ConsumerWSSubscriptionManager with DirectWSSubscriptionManager for
 direct RPC WebSocket subscriptions and adds DirectGRPCSubscriptionManager for gRPC streaming
• Removes blockchain transaction signing logic and private key handling (no longer needed for static
 provider mode)
• Adds epoch update cleanup for stale ChainTracker instances when provider sessions change

protocol/rpcsmartrouter/rpcsmartrouter.go


10. protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go ✨ Enhancement +360/-0

Per-endpoint chain tracker manager for sync validation

• Introduces EndpointChainTrackerManager for managing per-endpoint ChainTracker instances with
 lazy initialization
• Provides thread-safe tracker creation, retrieval, and removal with individual context cancellation
 per tracker
• Includes sync validation methods (ValidateEndpointSync, GetSyncGap) for pre-request
 consistency checking
• Supports optional callbacks for fork detection, new block events, consistency changes, and fetch
 errors

protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go


11. AGENTS.md Additional files +24/-0

...

AGENTS.md


12. BRANCH_CHANGES_SLIDE.md Additional files +163/-0

...

BRANCH_CHANGES_SLIDE.md


13. PROVIDER_SELECTION_HEADER.md Additional files +142/-0

...

PROVIDER_SELECTION_HEADER.md


14. RPCSMARTROUTER_TESTING_CHECKLIST.md Additional files +98/-0

...

RPCSMARTROUTER_TESTING_CHECKLIST.md


15. WEIGHTED_SELECTOR_TESTING_PLAN.md Additional files +761/-0

...

WEIGHTED_SELECTOR_TESTING_PLAN.md


16. WEIGHTED_SELECTOR_TEST_RESULTS.md Additional files +330/-0

...

WEIGHTED_SELECTOR_TEST_RESULTS.md


17. config/consumer_examples/lava_consumer_static_peers.yml Additional files +1/-1

...

config/consumer_examples/lava_consumer_static_peers.yml


18. config/consumer_examples/lava_consumer_static_with_backup.yml Additional files +2/-2

...

config/consumer_examples/lava_consumer_static_with_backup.yml


19. config/consumer_examples/lava_consumer_static_with_backup_base.yml Additional files +53/-0

...

config/consumer_examples/lava_consumer_static_with_backup_base.yml


20. config/consumer_examples/lava_consumer_static_with_backup_bch.yml Additional files +49/-0

...

config/consumer_examples/lava_consumer_static_with_backup_bch.yml


21. config/consumer_examples/lava_consumer_static_with_backup_eth.yml Additional files +2/-2

...

config/consumer_examples/lava_consumer_static_with_backup_eth.yml


22. config/provider1_avaxc.yml Additional files +8/-0

...

config/provider1_avaxc.yml


23. metrics.json Additional files +13425/-0

...

metrics.json


24. protocol/chainlib/base_chain_parser.go Additional files +14/-7

...

protocol/chainlib/base_chain_parser.go


25. protocol/chainlib/chainlib.go Additional files +15/-3

...

protocol/chainlib/chainlib.go


26. protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage_test.go Additional files +93/-6

...

protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage_test.go


27. protocol/chainlib/chainproxy/rpcclient/json.go Additional files +6/-1

...

protocol/chainlib/chainproxy/rpcclient/json.go


28. protocol/chainlib/chainproxy/rpcclient/subscription_test.go Additional files +4/-4

...

protocol/chainlib/chainproxy/rpcclient/subscription_test.go


29. protocol/chainlib/consumer_ws_subscription_manager.go Additional files +4/-3

...

protocol/chainlib/consumer_ws_subscription_manager.go


30. protocol/chainlib/consumer_ws_subscription_manager_test.go Additional files +2/-2

...

protocol/chainlib/consumer_ws_subscription_manager_test.go


31. protocol/chainlib/grpc.go Additional files +16/-4

...

protocol/chainlib/grpc.go


32. protocol/chainlib/grpcproxy/dyncodec/file_registry_test.go Additional files +223/-0

...

protocol/chainlib/grpcproxy/dyncodec/file_registry_test.go


33. protocol/chainlib/grpcproxy/dyncodec/hybrid_registry.go Additional files +304/-0

...

protocol/chainlib/grpcproxy/dyncodec/hybrid_registry.go


34. protocol/chainlib/grpcproxy/dyncodec/remote_file.go Additional files +268/-0

...

protocol/chainlib/grpcproxy/dyncodec/remote_file.go


35. protocol/chainlib/grpcproxy/grpc_reflection_proxy.go Additional files +124/-0

...

protocol/chainlib/grpcproxy/grpc_reflection_proxy.go


36. protocol/chainlib/grpcproxy/grpcproxy.go Additional files +40/-0

...

protocol/chainlib/grpcproxy/grpcproxy.go


37. protocol/chainlib/jsonRPC.go Additional files +30/-25

...

protocol/chainlib/jsonRPC.go


38. protocol/chainlib/node_error_handler.go Additional files +1/-1

...

protocol/chainlib/node_error_handler.go


39. protocol/chainlib/node_error_handler_test.go Additional files +2/-2

...

protocol/chainlib/node_error_handler_test.go


40. protocol/chainlib/referer_stub.go Additional files +8/-0

...

protocol/chainlib/referer_stub.go


41. protocol/chainlib/tendermintRPC.go Additional files +31/-26

...

protocol/chainlib/tendermintRPC.go


42. protocol/chainlib/ws_subscription_manager.go Additional files +60/-0

...

protocol/chainlib/ws_subscription_manager.go


43. protocol/chaintracker/chain_tracker.go Additional files +5/-0

...

protocol/chaintracker/chain_tracker.go


44. protocol/common/conf.go Additional files +4/-2

...

protocol/common/conf.go


45. protocol/common/endpoints.go Additional files +101/-0

...

protocol/common/endpoints.go


46. protocol/integration/protocol_test.go Additional files +8/-9

...

protocol/integration/protocol_test.go


47. protocol/lavasession/consumer_session_manager.go Additional files +197/-38

...

protocol/lavasession/consumer_session_manager.go


48. protocol/lavasession/consumer_session_manager_test.go Additional files +55/-0

...

protocol/lavasession/consumer_session_manager_test.go


49. protocol/lavasession/consumer_types.go Additional files +177/-10

...

protocol/lavasession/consumer_types.go


50. protocol/lavasession/consumer_types_stake_weights_test.go Additional files +17/-0

...

protocol/lavasession/consumer_types_stake_weights_test.go


51. protocol/lavasession/direct_rpc_connection.go Additional files +865/-0

...

protocol/lavasession/direct_rpc_connection.go


52. protocol/lavasession/direct_rpc_connection_test.go Additional files +326/-0

...

protocol/lavasession/direct_rpc_connection_test.go


53. protocol/lavasession/direct_rpc_session_selection_test.go Additional files +184/-0

...

protocol/lavasession/direct_rpc_session_selection_test.go


54. protocol/lavasession/session_connection_test.go Additional files +445/-0

...

protocol/lavasession/session_connection_test.go


55. protocol/lavasession/single_consumer_session.go Additional files +69/-12

...

protocol/lavasession/single_consumer_session.go


56. protocol/metrics/consumer_metrics_manager_inf.go Additional files +115/-0

...

protocol/metrics/consumer_metrics_manager_inf.go


57. protocol/metrics/consumer_metrics_manager_test.go Additional files +5/-5

...

protocol/metrics/consumer_metrics_manager_test.go


58. protocol/metrics/mapped_labels_counter_vec.go Additional files +15/-2

...

protocol/metrics/mapped_labels_counter_vec.go


59. protocol/metrics/mapped_labels_gauge_vec.go Additional files +15/-2

...

protocol/metrics/mapped_labels_gauge_vec.go


60. protocol/metrics/rpcconsumer_logs.go Additional files +3/-2

...

protocol/metrics/rpcconsumer_logs.go


61. protocol/metrics/smartrouter_metrics_manager.go Additional files +819/-0

...

protocol/metrics/smartrouter_metrics_manager.go


62. protocol/relaycore/consistency_config.go Additional files +99/-0

...

protocol/relaycore/consistency_config.go


63. protocol/relaycore/consistency_validation.go Additional files +120/-0

...

protocol/relaycore/consistency_validation.go


64. protocol/relaycore/consistency_validation_test.go Additional files +292/-0

...

protocol/relaycore/consistency_validation_test.go


65. protocol/relaycore/relay_processor.go Additional files +121/-36

...

protocol/relaycore/relay_processor.go


66. protocol/relaycore/relay_processor_memory_test.go Additional files +4/-4

...

protocol/relaycore/relay_processor_memory_test.go


67. protocol/rpcconsumer/rpcconsumer_server.go Additional files +1/-1

...

protocol/rpcconsumer/rpcconsumer_server.go


68. protocol/rpcprovider/rpcprovider.go Additional files +61/-245

...

protocol/rpcprovider/rpcprovider.go


69. protocol/rpcprovider/rpcprovider_server.go Additional files +16/-41

...

protocol/rpcprovider/rpcprovider_server.go


70. protocol/rpcprovider/rpcprovider_server_test.go Additional files +7/-124

...

protocol/rpcprovider/rpcprovider_server_test.go


71. protocol/rpcprovider/static_provider_parsing_test.go Additional files +0/-345

...

protocol/rpcprovider/static_provider_parsing_test.go


72. protocol/rpcprovider/static_provider_validation_test.go Additional files +0/-189

...

protocol/rpcprovider/static_provider_validation_test.go


73. protocol/rpcprovider/testing.go Additional files +6/-6

...

protocol/rpcprovider/testing.go


74. protocol/rpcsmartrouter/direct_grpc_subscription_manager.go Additional files +956/-0

...

protocol/rpcsmartrouter/direct_grpc_subscription_manager.go


75. protocol/rpcsmartrouter/direct_grpc_subscription_manager_test.go Additional files +405/-0

...

protocol/rpcsmartrouter/direct_grpc_subscription_manager_test.go


76. protocol/rpcsmartrouter/direct_rpc_integration_test.go Additional files +517/-0

...

protocol/rpcsmartrouter/direct_rpc_integration_test.go


77. protocol/rpcsmartrouter/direct_rpc_relay.go Additional files +724/-0

...

protocol/rpcsmartrouter/direct_rpc_relay.go


78. protocol/rpcsmartrouter/endpoint_chain_fetcher.go Additional files +275/-0

...

protocol/rpcsmartrouter/endpoint_chain_fetcher.go


79. protocol/rpcsmartrouter/endpoint_chain_tracker_test.go Additional files +341/-0

...

protocol/rpcsmartrouter/endpoint_chain_tracker_test.go


80. protocol/rpcsmartrouter/error_mapper.go Additional files +85/-0

...

protocol/rpcsmartrouter/error_mapper.go


81. protocol/rpcsmartrouter/error_mapper_test.go Additional files +199/-0

...

protocol/rpcsmartrouter/error_mapper_test.go


82. protocol/rpcsmartrouter/grpc_streaming_config.go Additional files +144/-0

...

protocol/rpcsmartrouter/grpc_streaming_config.go


83. protocol/rpcsmartrouter/grpc_streaming_config_test.go Additional files +220/-0

...

protocol/rpcsmartrouter/grpc_streaming_config_test.go


84. protocol/rpcsmartrouter/minimal_state_tracker_mock.go Additional files +0/-49

...

protocol/rpcsmartrouter/minimal_state_tracker_mock.go


85. protocol/rpcsmartrouter/noop_ws_subscription_manager.go Additional files +82/-0

...

protocol/rpcsmartrouter/noop_ws_subscription_manager.go


86. protocol/rpcsmartrouter/noop_ws_subscription_manager_test.go Additional files +132/-0

...

protocol/rpcsmartrouter/noop_ws_subscription_manager_test.go


87. protocol/rpcsmartrouter/rest_integration_test.go Additional files +354/-0

...

protocol/rpcsmartrouter/rest_integration_test.go


88. protocol/rpcsmartrouter/rpcsmartrouter_compression_test.go Additional files +0/-200

...

protocol/rpcsmartrouter/rpcsmartrouter_compression_test.go


89. protocol/rpcsmartrouter/smartrouter_relay_state_machine.go Additional files +11/-24

...

protocol/rpcsmartrouter/smartrouter_relay_state_machine.go


90. protocol/rpcsmartrouter/smartrouter_relay_state_machine_test.go Additional files +10/-10

...

protocol/rpcsmartrouter/smartrouter_relay_state_machine_test.go


91. protocol/rpcsmartrouter/subscription_id_mapper.go Additional files +189/-0

...

protocol/rpcsmartrouter/subscription_id_mapper.go


92. protocol/rpcsmartrouter/subscription_id_mapper_test.go Additional files +270/-0

...

protocol/rpcsmartrouter/subscription_id_mapper_test.go


93. protocol/rpcsmartrouter/upstream_grpc_pool.go Additional files +532/-0

...

protocol/rpcsmartrouter/upstream_grpc_pool.go


94. protocol/rpcsmartrouter/upstream_grpc_pool_test.go Additional files +408/-0

...

protocol/rpcsmartrouter/upstream_grpc_pool_test.go


95. protocol/rpcsmartrouter/upstream_ws_pool.go Additional files +543/-0

...

protocol/rpcsmartrouter/upstream_ws_pool.go


96. protocol/rpcsmartrouter/websocket_backoff.go Additional files +126/-0

...

protocol/rpcsmartrouter/websocket_backoff.go


97. protocol/rpcsmartrouter/websocket_backoff_test.go Additional files +228/-0

...

protocol/rpcsmartrouter/websocket_backoff_test.go


98. protocol/rpcsmartrouter/websocket_config.go Additional files +148/-0

...

protocol/rpcsmartrouter/websocket_config.go


99. protocol/rpcsmartrouter/websocket_config_test.go Additional files +204/-0

...

protocol/rpcsmartrouter/websocket_config_test.go


100. scripts/EXAMPLE_USAGE.md Additional files +166/-0

...

scripts/EXAMPLE_USAGE.md


101. scripts/README_extract_qos.md Additional files +142/-0

...

scripts/README_extract_qos.md


102. scripts/README_parse_consumer_log.md Additional files +174/-0

...

scripts/README_parse_consumer_log.md


103. scripts/analyze_qos_output.py Additional files +211/-0

...

scripts/analyze_qos_output.py


104. scripts/extract_qos_from_consumer_log.py Additional files +282/-0

...

scripts/extract_qos_from_consumer_log.py


105. scripts/mock_rest_server/README.md Additional files +132/-0

...

scripts/mock_rest_server/README.md


106. scripts/mock_rest_server/main.go Additional files +142/-0

...

scripts/mock_rest_server/main.go


107. scripts/mock_rpc_server/README.md Additional files +130/-0

...

scripts/mock_rpc_server/README.md


108. scripts/mock_rpc_server/init_mock_server.go Additional files +186/-0

...

scripts/mock_rpc_server/init_mock_server.go


109. scripts/monitoring/README.md Additional files +149/-0

...

scripts/monitoring/README.md


110. scripts/parse_consumer_log.py Additional files +414/-0

...

scripts/parse_consumer_log.py


111. Additional files not shown Additional files +0/-0

...

Additional files not shown


Grey Divider

Qodo Logo

@qodo-code-review

qodo-code-review Bot commented Mar 9, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (5) 📘 Rule violations (6) 📎 Requirement gaps (0)

Grey Divider


Action required

1. TestDefaultWebsocketConfig naming 📘 Rule violation ✓ Correctness ⭐ New
Description
The new test function TestDefaultWebsocketConfig does not follow the required
TestComponent_Scenario naming convention (missing underscore-separated component/scenario). This
reduces test discoverability and consistency across the suite.
Code

protocol/rpcsmartrouter/websocket_config_test.go[11]

+func TestDefaultWebsocketConfig(t *testing.T) {
Evidence
PR Compliance ID 5 requires test names to follow TestComponent_Scenario, but this PR introduces a
new test whose name does not match that pattern.

AGENTS.md
protocol/rpcsmartrouter/websocket_config_test.go[11-11]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A newly added test function name does not follow the required `TestComponent_Scenario` convention.

## Issue Context
Compliance requires test names to follow `TestComponent_Scenario` for consistency and discoverability.

## Fix Focus Areas
- protocol/rpcsmartrouter/websocket_config_test.go[11-11]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. t.Error used in test 📘 Rule violation ✓ Correctness ⭐ New
Description
The new test uses t.Error(...) for assertion failures instead of require/assert helpers. This
violates the testing compliance requirement and can reduce consistency of failure diagnostics across
the suite.
Code

protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go[R1012-1026]

+		t.Error("Client 1 should have received a message")
+	}
+
+	// Verify client 2 received message with their router ID
+	select {
+	case reply2 := <-client2Chan:
+		var msg2 map[string]interface{}
+		err := json.Unmarshal(reply2.Data, &msg2)
+		require.NoError(t, err)
+		params2, ok := msg2["params"].(map[string]interface{})
+		require.True(t, ok, "params should be a map")
+		assert.Equal(t, client2RouterID, params2["subscription"],
+			"Client 2 should receive message with their router ID")
+	default:
+		t.Error("Client 2 should have received a message")
Evidence
PR Compliance ID 6 requires using require/assert helpers, but the test uses raw t.Error(...)
in the select default case to indicate a missing message.

AGENTS.md
protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go[1012-1026]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A new test uses `t.Error(...)` instead of `require`/`assert` helpers.

## Issue Context
The repository testing guidelines require `require`/`assert` to improve readability and diagnostics.

## Fix Focus Areas
- protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go[1012-1026]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Concurrent map write panic 🐞 Bug ⛯ Reliability ⭐ New
Description
RPCSmartRouter.Start sets up endpoints concurrently, while CreateSmartRouterEndpoint writes to
rpsr.rpcServers (a plain map) without synchronization, which can crash at runtime with "concurrent
map writes" during startup. This is triggered when multiple endpoints (e.g., different
chains/api-interfaces) are configured and initialized in parallel.
Code

protocol/rpcsmartrouter/rpcsmartrouter.go[R1042-1044]

+
+	// Store server reference for per-endpoint ChainTracker cleanup on epoch updates
+	rpsr.rpcServers[sessionManagerKey] = rpcSmartRouterServer
Evidence
Start launches one goroutine per endpoint, so CreateSmartRouterEndpoint executes concurrently across
endpoints; the new rpcServers map is then written from those goroutines with no shared lock, which
is unsafe for Go maps.

protocol/rpcsmartrouter/rpcsmartrouter.go[258-277]
pr_files_diffs/protocol_rpcsmartrouter_rpcsmartrouter_go.patch[595-599]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`RPCSmartRouter.Start()` initializes endpoints in parallel goroutines, but `CreateSmartRouterEndpoint()` writes to `rpsr.rpcServers` (a plain Go map) without synchronization. This can crash the process with a runtime panic: `fatal error: concurrent map writes`.

### Issue Context
The router starts `CreateSmartRouterEndpoint` concurrently for each configured endpoint. The PR introduces `rpsr.rpcServers[...] = rpcSmartRouterServer` to support ChainTracker cleanup, adding a new concurrent map write.

### Fix Focus Areas
- Add router-level synchronization (mutex) or replace `rpcServers` with a concurrency-safe map type.
- Ensure all reads/writes to `rpcServers` (and any other shared maps touched in endpoint setup) are protected consistently.

- file/path references:
 - protocol/rpcsmartrouter/rpcsmartrouter.go[258-277]
 - protocol/rpcsmartrouter/rpcsmartrouter.go[493-499]
 - protocol/rpcsmartrouter/rpcsmartrouter.go[1042-1044]
 - protocol/rpcsmartrouter/rpcsmartrouter.go[1560-1564]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (6)
4. REST node-error mismatch 🐞 Bug ✓ Correctness ⭐ New
Description
DirectRPCRelaySender.sendRESTRelay sets RelayResult.IsNodeError from HTTP status only, but the
system’s authoritative node-error classification is ProtocolMessage.CheckResponseError; this creates
split-brain behavior where relaycore treats responses (e.g., HTTP 429 / Cosmos tx errors on HTTP
200) as node errors while the returned RelayResult may still claim IsNodeError=false. As a result,
RPCSmartRouterServer may omit the node-error header and apply success-path behaviors while relaycore
is tracking node errors / retry logic.
Code

protocol/rpcsmartrouter/direct_rpc_relay.go[R476-519]

+	// Proper error classification (don't treat all 4xx as node errors)
+	var isNodeError bool
+	switch {
+	case response.StatusCode >= 500:
+		isNodeError = true // Server error
+	case response.StatusCode == 429:
+		isNodeError = false // Rate limit (not node issue)
+	case response.StatusCode >= 400:
+		isNodeError = false // Client error
+	default:
+		isNodeError = false // Success
+	}
+
+	// Let the chain message parse domain-specific REST errors (e.g. Cosmos tx errors on HTTP 200).
+	// NOTE: This should NOT be treated as "node error" by default; it is typically a request/application error.
+	hasError, errorMessage := chainMessage.CheckResponseError(response.Body, response.StatusCode)
+	if hasError && errorMessage != "" {
+		utils.LavaFormatDebug("REST response contains error",
+			utils.LogAttr("endpoint", d.endpointName),
+			utils.LogAttr("error", errorMessage),
+		)
+	}
+
+	// Convert response headers to metadata
+	responseMetadata := convertHTTPHeadersToMetadata(response.Headers)
+
+	// Build result (include body even for 4xx/5xx!)
+	providerAddress := d.endpointName
+	if providerAddress == "" {
+		providerAddress = sanitizeEndpointURL(d.directConnection.GetURL())
+	}
+
+	result := &common.RelayResult{
+		Reply: &pairingtypes.RelayReply{
+			Data:     response.Body,    // Include body even for errors!
+			Metadata: responseMetadata, // Include headers
+		},
+		Finalized:  true,
+		StatusCode: response.StatusCode,
+		ProviderInfo: common.ProviderInfo{
+			ProviderAddress: providerAddress,
+		},
+		IsNodeError: isNodeError, // Correct transport-level classification
+	}
Evidence
REST direct-relay explicitly sets 429 as non-node-error, while RestMessage.CheckResponseError (used
by relaycore ResultsManager) treats 429 (and Cosmos tx_response.code!=0) as node errors.
RPCSmartRouterServer propagates IsNodeError from the direct-relay result and uses it to emit the
node-error header, so the mismatch becomes externally visible and can desynchronize
health/metrics/header semantics from relaycore’s node-error tracking.

protocol/rpcsmartrouter/direct_rpc_relay.go[476-519]
protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go[52-71]
protocol/relaycore/results_manager.go[109-145]
protocol/relaycore/relay_processor.go[432-440]
protocol/rpcsmartrouter/rpcsmartrouter_server.go[1870-1875]
protocol/rpcsmartrouter/rpcsmartrouter_server.go[2184-2191]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

### Issue description
`DirectRPCRelaySender.sendRESTRelay()` computes `isNodeError` purely from HTTP status codes (including treating HTTP 429 as non-node-error), but relaycore classifies node errors using `ProtocolMessage.CheckResponseError`. For REST, `RestMessage.CheckResponseError()` treats 429 (and Cosmos tx_response.code != 0 on HTTP 200) as node errors, so the smart router can end up with relaycore tracking a node error while `RelayResult.IsNodeError` remains false.

This causes inconsistent behavior: node-error headers and any IsNodeError-based logic in `RPCSmartRouterServer` diverge from relaycore’s node-error handling/retry decisions.

### Issue Context
- REST direct relay currently logs `hasError` but does not use it to set `IsNodeError`.
- relaycore’s ResultsManager treats `hasError==true` as a node error.

### Fix Focus Areas
- Align REST direct-relay `IsNodeError` with `chainMessage.CheckResponseError` (or change one side so both agree).
- Ensure smart-router headers/metrics reflect relaycore’s node-error decision.
- Update/adjust REST integration tests if their 429 expectation changes.

- file/path references:
 - protocol/rpcsmartrouter/direct_rpc_relay.go[476-519]
 - protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go[52-71]
 - protocol/relaycore/results_manager.go[109-145]
 - protocol/rpcsmartrouter/rpcsmartrouter_server.go[1870-1875]
 - protocol/rpcsmartrouter/rpcsmartrouter_server.go[2184-2191]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. ETH_RPC_URL_2 hardcoded API key 📘 Rule violation ⛨ Security
Description
The init script commits a real-looking API key inside a default RPC URL, which risks credential
leakage and unauthorized use. Secrets must not be stored in the repo; they should be supplied via
environment variables or placeholders.
Code

scripts/pre_setups/init_lava_smartrouter_eth.sh[84]

+export ETH_RPC_URL_2="${ETH_RPC_URL_2:-https://json-rpc.8zfcse2amst1lajmh299uq4jn.blockchainnodeengine.com/?key=AIzaSyDyUtm6b-e-xKDQgVWzlroHdVTytiXEDik}"
Evidence
Compliance requires that secrets are not committed and instead provided via environment variables;
the script hardcodes an API key directly in the repository-tracked file.

AGENTS.md
scripts/pre_setups/init_lava_smartrouter_eth.sh[83-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A real-looking API key (and gateway token-like values) are committed as default RPC endpoint values in a tracked script.
## Issue Context
Compliance requires that secrets are not committed and are provided via environment variables or templates.
## Fix Focus Areas
- scripts/pre_setups/init_lava_smartrouter_eth.sh[82-96]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. err.Error() logged unredacted 📘 Rule violation ⛨ Security
Description
The direct RPC relay logs the raw error string, which may include full upstream URLs and embedded
API keys/tokens. This can leak secrets into logs and violates secure logging requirements.
Code

protocol/rpcsmartrouter/direct_rpc_relay.go[R299-305]

+		return nil, MapDirectRPCError(err, d.directConnection.GetProtocol())
+	}
+
+	statusCode := response.StatusCode
+	responseData := response.Body
+
+	// Handle HTTP error status codes
Evidence
Secure logging requires that no sensitive data (including API keys/tokens) appears in logs; logging
err.Error() can include request URLs, and this PR introduces configs/scripts with embedded
key-like URL segments.

Rule 5: Generic: Secure Logging Practices
protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]
scripts/pre_setups/init_lava_smartrouter_eth.sh[83-90]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Raw `err.Error()` is logged and may contain full upstream URLs and embedded secrets.
## Issue Context
Even debug logs must not contain secrets; direct RPC endpoints commonly embed tokens in URL paths or query params.
## Fix Focus Areas
- protocol/rpcsmartrouter/direct_rpc_relay.go[292-306]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. MapDirectRPCError exposes internal errors 📘 Rule violation ⛨ Security
Description
The direct-RPC error mapping wraps and returns the underlying error (%w), which can propagate
internal network/endpoint details to clients. User-facing errors should be generic, with detailed
causes kept only in internal logs.
Code

protocol/rpcsmartrouter/error_mapper.go[R20-32]

+	if isConnectionRefused(err) {
+		return fmt.Errorf("RPC endpoint unavailable (connection refused): %w", err)
+	}
+
+	if isTimeout(err) {
+		return fmt.Errorf("RPC request timeout: %w", err)
+	}
+
+	// Protocol-specific error handling
+	switch protocol {
+	case lavasession.DirectRPCProtocolHTTP, lavasession.DirectRPCProtocolHTTPS:
+		return mapHTTPError(err)
+	case lavasession.DirectRPCProtocolGRPC:
Evidence
Secure error handling requires not exposing internal implementation details to end users; wrapping
the underlying error directly in returned messages risks leaking endpoint addresses and other
internals.

Rule 4: Generic: Secure Error Handling
protocol/rpcsmartrouter/error_mapper.go[19-32]
protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`MapDirectRPCError` returns errors that include underlying internal error details.
## Issue Context
User-facing errors should not expose internal details (endpoint addresses, low-level network errors). Details should be logged internally.
## Fix Focus Areas
- protocol/rpcsmartrouter/error_mapper.go[19-45]
- protocol/rpcsmartrouter/direct_rpc_relay.go[299-305]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


8. Direct RPC probe always fails 🐞 Bug ✓ Correctness
Description
Direct-RPC endpoints can be reported as connected while producing EndpointAndChosenConnection
entries with nil chosenEndpointConnection, so probeProvider won’t take the direct-RPC probe path and
instead errors with “returned nil client in endpoint”. This can break periodic health
probing/optimizer inputs and can lead to endpoints being treated as unhealthy even when direct
connections are healthy.
Code

protocol/lavasession/consumer_session_manager.go[R457-466]

+	// Check if this is a direct RPC endpoint (smart router mode)
+	// Direct RPC endpoints return empty endpoints list but connected=true
+	// We need to probe them differently - using the DirectRPCConnection health check
+	if len(endpoints) == 0 && connected {
+		return csm.probeDirectRPCEndpoints(ctx, consumerSessionsWithProvider, providerAddress)
+	}
+
  var endpointInfos []EndpointInfo
  lastError := fmt.Errorf("endpoints list is empty") // this error will happen if we had 0 endpoints
  for _, endpointAndConnection := range endpoints {
Evidence
probeProvider assumes direct-RPC mode yields an empty endpoints slice, but the endpoint selection
code appends entries even when connectEndpoint returns (nil, true) for direct RPC. As a result,
len(endpoints) != 0, so probeProvider skips probeDirectRPCEndpoints and hits the nil-client error
guard, causing probe failures.

protocol/lavasession/consumer_session_manager.go[447-482]
protocol/lavasession/consumer_types.go[676-694]
protocol/lavasession/consumer_types.go[806-816]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`probeProvider` assumes direct-RPC mode produces an empty `endpoints` slice, but direct-RPC selection returns `connected=true` with `chosenEndpointConnection=nil`. This causes probes to fail with `returned nil client in endpoint` instead of using `probeDirectRPCEndpoints`.
### Issue Context
Direct-RPC `connectEndpoint` returns `(nil, true)` when the pre-established direct connection is healthy. The calling code still appends this as an endpoint entry, so `len(endpoints) != 0` and the special-case `len(endpoints)==0` direct-RPC probe logic never runs.
### Fix Focus Areas
- protocol/lavasession/consumer_session_manager.go[447-482]
- protocol/lavasession/consumer_types.go[676-694]
- protocol/lavasession/consumer_types.go[806-816]
### Suggested changes
- In `probeProvider`, detect direct-RPC by checking whether **all** returned entries have `chosenEndpointConnection == nil` (or whether `endpoint.IsDirectRPC()`), and then call `probeDirectRPCEndpoints`.
- Alternatively, in `fetchEndpointConnectionFromConsumerSessionWithProvider`, for direct-RPC endpoints do not append `EndpointAndChosenConnection` entries with nil `chosenEndpointConnection` (but ensure session selection still has enough information to pick endpoints—e.g., store endpoints separately or add an explicit flag).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Endpoint health data races 🐞 Bug ⛯ Reliability
Description
Endpoint health tracking mutates ConnectionRefusals/Enabled and writes LastBlockUpdate without
synchronization, despite documenting that Endpoint.mu protects these fields. These are
called/updated from concurrent request goroutines and tracker callbacks, risking undefined behavior
and incorrect routing decisions under load.
Code

protocol/lavasession/consumer_types.go[R234-255]

+// MarkUnhealthy increments connection refusals and disables endpoint if threshold exceeded
+func (e *Endpoint) MarkUnhealthy() {
+	e.ConnectionRefusals++
+	if e.ConnectionRefusals >= MaxConsecutiveConnectionAttempts {
+		e.Enabled = false
+		utils.LavaFormatWarning("disabled unhealthy endpoint", nil,
+			utils.LogAttr("endpoint", e.NetworkAddress),
+			utils.LogAttr("refusals", e.ConnectionRefusals),
+			utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
+		)
+	}
+}
+
+// ResetHealth resets connection refusals and re-enables endpoint
+func (e *Endpoint) ResetHealth() {
+	e.ConnectionRefusals = 0
+	e.Enabled = true
+	utils.LavaFormatInfo("re-enabled healthy endpoint",
+		utils.LogAttr("endpoint", e.NetworkAddress),
+		utils.LogAttr("is_direct_rpc", e.IsDirectRPC()),
+	)
+}
Evidence
Endpoint.mu is explicitly documented as protecting ConnectionRefusals and Enabled, but
MarkUnhealthy/ResetHealth update them without locking/atomic operations. rpcsmartrouter_server
invokes these methods on request paths concurrently. Separately, LastBlockUpdate (time.Time) is
written from callbacks without synchronization while also being written elsewhere, which is a Go
data race.

protocol/lavasession/consumer_types.go[188-255]
protocol/rpcsmartrouter/rpcsmartrouter_server.go[1766-1801]
protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go[160-172]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`Endpoint.MarkUnhealthy` / `ResetHealth` and block tracking (`LastBlockUpdate`) perform unsynchronized reads/writes to shared fields that are used concurrently by smart-router request handlers and tracker callbacks, creating Go data races.
### Issue Context
- `Endpoint.mu` is documented as protecting `ConnectionRefusals` and `Enabled`, but these methods mutate without locking.
- `LastBlockUpdate` is a `time.Time` written from multiple goroutines without any lock/atomic.
### Fix Focus Areas
- protocol/lavasession/consumer_types.go[188-255]
- protocol/rpcsmartrouter/rpcsmartrouter_server.go[1766-1801]
- protocol/rpcsmartrouter/endpoint_chain_tracker_manager.go[160-172]
### Suggested changes
- Wrap `MarkUnhealthy` and `ResetHealth` bodies with `e.mu.Lock()/Unlock()` (or switch `ConnectionRefusals` to `atomic.Uint64` and `Enabled` to `atomic.Bool`).
- For `LastBlockUpdate`, either:
- Guard all reads/writes with `e.mu`, or
- Replace with `atomic.Int64` holding `time.Now().UnixNano()`.
- Update call sites that read `ConnectionRefusals`/`Enabled`/`LastBlockUpdate` to use the same locking/atomic approach (e.g., `if targetEndpoint != nil &amp;amp;&amp;amp; targetEndpoint.ConnectionRefusals &amp;gt; 0` should not read without synchronization if you keep mutex-based protection).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

10. mainnet-1 directory name 📘 Rule violation ✓ Correctness ⭐ New
Description
A new specs path uses the directory name mainnet-1, which is not snake_case due to the hyphen.
This introduces inconsistent directory naming under specs/.
Code

specs/mainnet-1/specs/avalanche_c.json[R1-3]

+{
+  "proposal": {
+    "title": "Add Specs: Avalanche C Chain",
Evidence
PR Compliance ID 3 requires directories to be snake_case, but the newly added file is placed under
specs/mainnet-1/, which contains a hyphen.

AGENTS.md
specs/mainnet-1/specs/avalanche_c.json[1-3]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
A new specs file is introduced under a directory (`mainnet-1`) that is not `snake_case`.

## Issue Context
Compliance requires directories to use `snake_case`. Hyphenated directory names break that convention; however, this may have wider implications if other tooling expects `mainnet-1`.

## Fix Focus Areas
- specs/mainnet-1/specs/avalanche_c.json[1-3]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


11. Quorum logs too verbose 🐞 Bug ➹ Performance
Description
responsesCrossValidation now emits multiple Info-level logs per quorum check with high-cardinality
fields (GUID, providers, hashes). In cross-validation mode this can significantly increase log
volume and CPU/IO overhead.
Code

protocol/relaycore/relay_processor.go[R521-528]

+	// Log quorum validation start
+	utils.LavaFormatInfo("🔍 [Quorum Validation] Starting consensus check",
+		utils.LogAttr("GUID", rp.guid),
+		utils.LogAttr("totalResults", len(results)),
+		utils.LogAttr("requiredQuorumSize", crossValidationSize),
+		utils.LogAttr("agreementThreshold", rp.getAgreementThreshold()),
+		utils.LogAttr("maxParticipants", rp.getMaxParticipants()),
+	)
Evidence
responsesCrossValidation is part of relay processing in cross-validation/quorum selection. The new
Info logs will execute for each quorum check, and some additional Info logs also appear deeper in
the function when quorum fails/succeeds.

protocol/relaycore/relay_processor.go[516-528]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Info-level quorum logs in `responsesCrossValidation` are on a hot path and include high-cardinality attributes, which can flood logs and add overhead in production.
### Issue Context
This runs whenever cross-validation/quorum is enabled and a quorum evaluation occurs.
### Fix Focus Areas
- protocol/relaycore/relay_processor.go[516-528]
### Suggested changes
- Change `LavaFormatInfo` to `LavaFormatDebug` (or `Trace`) for per-request quorum logs.
- Optionally gate behind `rp.debugRelay` or a dedicated configuration flag.
- Consider sampling if Info-level visibility is required.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment thread protocol/rpcsmartrouter/websocket_config_test.go Outdated
Comment thread protocol/rpcsmartrouter/direct_ws_subscription_manager_test.go Outdated
Comment thread protocol/rpcsmartrouter/rpcsmartrouter.go
Comment on lines +476 to +519
// Proper error classification (don't treat all 4xx as node errors)
var isNodeError bool
switch {
case response.StatusCode >= 500:
isNodeError = true // Server error
case response.StatusCode == 429:
isNodeError = false // Rate limit (not node issue)
case response.StatusCode >= 400:
isNodeError = false // Client error
default:
isNodeError = false // Success
}

// Let the chain message parse domain-specific REST errors (e.g. Cosmos tx errors on HTTP 200).
// NOTE: This should NOT be treated as "node error" by default; it is typically a request/application error.
hasError, errorMessage := chainMessage.CheckResponseError(response.Body, response.StatusCode)
if hasError && errorMessage != "" {
utils.LavaFormatDebug("REST response contains error",
utils.LogAttr("endpoint", d.endpointName),
utils.LogAttr("error", errorMessage),
)
}

// Convert response headers to metadata
responseMetadata := convertHTTPHeadersToMetadata(response.Headers)

// Build result (include body even for 4xx/5xx!)
providerAddress := d.endpointName
if providerAddress == "" {
providerAddress = sanitizeEndpointURL(d.directConnection.GetURL())
}

result := &common.RelayResult{
Reply: &pairingtypes.RelayReply{
Data: response.Body, // Include body even for errors!
Metadata: responseMetadata, // Include headers
},
Finalized: true,
StatusCode: response.StatusCode,
ProviderInfo: common.ProviderInfo{
ProviderAddress: providerAddress,
},
IsNodeError: isNodeError, // Correct transport-level classification
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

4. Rest node-error mismatch 🐞 Bug ✓ Correctness

DirectRPCRelaySender.sendRESTRelay sets RelayResult.IsNodeError from HTTP status only, but the
system’s authoritative node-error classification is ProtocolMessage.CheckResponseError; this creates
split-brain behavior where relaycore treats responses (e.g., HTTP 429 / Cosmos tx errors on HTTP
200) as node errors while the returned RelayResult may still claim IsNodeError=false. As a result,
RPCSmartRouterServer may omit the node-error header and apply success-path behaviors while relaycore
is tracking node errors / retry logic.
Agent Prompt
### Issue description
`DirectRPCRelaySender.sendRESTRelay()` computes `isNodeError` purely from HTTP status codes (including treating HTTP 429 as non-node-error), but relaycore classifies node errors using `ProtocolMessage.CheckResponseError`. For REST, `RestMessage.CheckResponseError()` treats 429 (and Cosmos tx_response.code != 0 on HTTP 200) as node errors, so the smart router can end up with relaycore tracking a node error while `RelayResult.IsNodeError` remains false.

This causes inconsistent behavior: node-error headers and any IsNodeError-based logic in `RPCSmartRouterServer` diverge from relaycore’s node-error handling/retry decisions.

### Issue Context
- REST direct relay currently logs `hasError` but does not use it to set `IsNodeError`.
- relaycore’s ResultsManager treats `hasError==true` as a node error.

### Fix Focus Areas
- Align REST direct-relay `IsNodeError` with `chainMessage.CheckResponseError` (or change one side so both agree).
- Ensure smart-router headers/metrics reflect relaycore’s node-error decision.
- Update/adjust REST integration tests if their 429 expectation changes.

- file/path references:
  - protocol/rpcsmartrouter/direct_rpc_relay.go[476-519]
  - protocol/chainlib/chainproxy/rpcInterfaceMessages/restMessage.go[52-71]
  - protocol/relaycore/results_manager.go[109-145]
  - protocol/rpcsmartrouter/rpcsmartrouter_server.go[1870-1875]
  - protocol/rpcsmartrouter/rpcsmartrouter_server.go[2184-2191]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch from e973ab7 to 569b6fc Compare March 9, 2026 13:20
Introduces direct RPC mode for RPCSmartRouter, bypassing the Lava
provider relay path and routing requests directly to configured RPC
endpoints. Key changes:

- Direct RPC mode: RPCSmartRouter can now connect and relay directly
  to RPC endpoints (HTTP/WS/gRPC) with provider selection via the
  existing optimizer (QoS, latency, sync, stake weights)
- Backup provider selection: endpoints are probed and verified at
  setup; QoS-based backup selection for failover
- Heavy request handling: skip latestBlock unmarshalling and response
  unmarshalling above 1MB to avoid memory pressure
- Endpoint health tracking: mark endpoints unhealthy on 5xx/connection
  errors; emit health state-transition metrics only on actual changes
- Metrics overhaul (SmartRouterMetricsManager):
  - Per-endpoint and router-scoped Prometheus metrics
  - Remove endpoint URLs from metric labels to avoid cardinality explosion
  - Node error recovery metrics
  - Router end-to-end latency now reflects true client-visible latency
    (measured from SendParsedRelay entry to result return, capturing
    provider selection overhead) rather than network-hop only
  - Per-endpoint latency retains network-only measurement
- Block tracking: set endpoint latest block from relay response in
  addition to ChainTracker
- Init script: remove legacy provider configs, adjust for direct RPC mode
- Spec: add health verification to AVAXP spec
- Relay core: use max(blockLagForQosSync*2, blockDistanceToFinalization)
  for EndpointLagThreshold

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@NadavLevi NadavLevi force-pushed the feat/direct-rpc-clean branch from 569b6fc to 85b48d8 Compare March 9, 2026 13:29
@nimrod-teich nimrod-teich merged commit e3055cf into main Mar 11, 2026
30 checks passed
@nimrod-teich nimrod-teich deleted the feat/direct-rpc-clean branch March 11, 2026 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants