Skip to content

feat: provider resource limiter#2117

Merged
nimrod-teich merged 16 commits into
mainfrom
provider_resource_limiter
Nov 30, 2025
Merged

feat: provider resource limiter#2117
nimrod-teich merged 16 commits into
mainfrom
provider_resource_limiter

Conversation

@avitenzer

Copy link
Copy Markdown
Collaborator

Provider Resource Limiter

Summary

Implements a semaphore-based resource limiter for RPC providers to prevent Out-of-Memory (OOM) crashes caused by concurrent high-resource requests. The limiter uses a two-tier bucket system ("heavy" vs "normal") with different concurrency limits and queuing strategies.

Problem

Provider servers experience OOM crashes when handling multiple concurrent resource-intensive requests (e.g., debug_traceTransaction, high-CU eth_call).

Solution

Method Classification

  1. Primary: CU-based - Methods with CU ≥ threshold (default: 100) → "heavy"
  2. Secondary: Name-based - debug_* and trace_* methods → "heavy"

Resource Limits

Bucket Max Concurrent Queue Size Memory/Call Timeout
Heavy 2 5 512 MB 30s
Normal 100 0 1 MB -

Memory Protection

  • Tracks estimated memory usage across requests
  • Monitors actual heap allocation
  • Rejects requests when memory exceeds 80% of threshold

Changes

New Files

  • protocol/rpcprovider/resource_limiter.go (548 lines) - Core implementation
  • protocol/rpcprovider/resource_limiter_test.go (731 lines) - 14 tests + 3 benchmarks

Modified Files

  • protocol/rpcprovider/rpcprovider.go - Added 6 CLI flags and initialization
  • protocol/rpcprovider/rpcprovider_server.go - Integrated limiter into relay execution

Configuration

lavap rpcprovider config.yml
--enable-resource-limiter=true
--resource-limiter-memory-gb=8
--resource-limiter-cu-threshold=100
--heavy-max-concurrent=2
--heavy-queue-size=5
--normal-max-concurrent=100
--from mykey

Metrics

  • lava_provider_resource_limiter_rejections_total{bucket, reason}
  • lava_provider_resource_limiter_queued_total{bucket}
  • lava_provider_resource_limiter_timeouts_total{bucket}
  • lava_provider_resource_limiter_in_flight{bucket}
  • lava_provider_resource_limiter_memory_bytes
  • lava_provider_resource_limiter_queue_wait_seconds{bucket}

- Introduced a new ResourceLimiter to manage concurrent executions based on method type and memory usage.
- Added configuration options for heavy and normal request buckets, including max concurrent executions, memory per call, and queue sizes.
- Integrated resource limiting into the RPCProviderServer to prevent OOM errors from high-CU requests.
- Added Prometheus metrics for monitoring request rejections, queue sizes, and memory usage.
- Enhanced setup to allow enabling/disabling the resource limiter via configuration flags.
- Introduced a new test file for ResourceLimiter, covering various scenarios including disabled state, bucket selection based on compute units, concurrency limits, queue timeouts, context cancellation, and memory limits.
- Implemented tests for execution errors and metrics tracking to ensure accurate monitoring of request handling.
- Added benchmarks for performance evaluation under different load conditions.
- Enhanced test coverage for mixed request types and memory reservation release.
- Verified error messages for queue full and max concurrent scenarios.
- Added debug logging for queue depth upon request enqueuing and rejection due to full queue.
- Improved memory monitoring logs to include current queue depths for heavy and normal buckets.
- These enhancements facilitate better tracking and debugging of resource management behavior in the system.
@avitenzer avitenzer changed the title Provider resource limiter feat: provider resource limiter Nov 26, 2025
- Modified the call to ServeRPCRequests by adding an additional nil parameter to improve compatibility with recent changes in the RPC provider server implementation.
- This adjustment ensures that the function signature aligns with the latest updates, maintaining the integrity of the test suite.
- Updated goroutine closures in the ResourceLimiter tests to use the loop variable directly, ensuring proper indexing for concurrent requests.
- Simplified the memory threshold check in the ResourceLimiter implementation for clarity and efficiency.
@github-actions

github-actions Bot commented Nov 26, 2025

Copy link
Copy Markdown

Test Results

3 089 tests  +23   3 087 ✅ +22   26m 59s ⏱️ + 3m 21s
  126 suites + 1       1 💤 ± 0 
    7 files   + 1       1 ❌ + 1 

For more details on these failures, see this check.

Results for commit 8e45a1a. ± Comparison against base commit 1bf05bc.

♻️ This comment has been updated with latest results.

avitenzer and others added 10 commits November 26, 2025 18:48
- Updated goroutine closures in the ResourceLimiter tests to pass the loop index as a parameter, ensuring accurate indexing for concurrent requests.
- This change prevents data races and ensures that results are stored correctly in the results slices during concurrent execution.
- Updated the Acquire method in ResourceLimiter to include a nil check for the ResourceLimiter instance, ensuring robustness when the instance is not initialized.
- This change prevents potential nil pointer dereference errors and maintains the intended functionality when the resource limiter is disabled.
…ency and queue size

- Updated the NewResourceLimiter function to accept additional parameters for heavy and normal bucket configurations, including maximum concurrent requests and queue sizes.
- Enhanced the SetupEndpoint method to retrieve these new configuration values from viper, ensuring flexibility in resource management settings.
- This change improves the adaptability of the resource limiter to varying load conditions and usage patterns.
…ation

- Added a new struct for resource limiter options to encapsulate configuration parameters such as memory threshold, CU threshold, and concurrency settings.
- Updated the RPCProvider to utilize the new resource limiter options, enhancing the flexibility of resource management.
- Modified the SetupEndpoint method to retrieve resource limiter settings from the new struct, streamlining the initialization process.
- Enhanced the Relay method in RPCProviderServer to ensure proper session cleanup when a request is rejected by the resource limiter.
- Added a call to OnSessionFailure to unlock the session, rollback CU deltas, and prevent session leaks.
- Improved error logging for cleanup failures to aid in debugging and monitoring.
- Updated the ResourceLimiter to accept an endpoint name during initialization, allowing for differentiated metrics per endpoint.
- Modified the NewResourceLimiter and related functions to incorporate the endpoint name, improving metric tracking and clarity.
- Adjusted tests to reflect the new parameter, ensuring comprehensive coverage of the updated functionality.
- Added a check in the ResourceLimiter's processQueue method to skip requests that have already been canceled or timed out.
- Implemented logging for skipped requests to aid in debugging and monitoring of request handling.
- This change improves the robustness of the resource limiter by preventing the execution of invalid requests.
@nimrod-teich nimrod-teich merged commit 1087ad5 into main Nov 30, 2025
117 of 124 checks passed
@nimrod-teich nimrod-teich deleted the provider_resource_limiter branch November 30, 2025 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants