Skip to content

fix: prevent session leaks and exhaustion in smart router and consumer#2158

Merged
nimrod-teich merged 4 commits into
mainfrom
fix/session-leak-prevention
Dec 31, 2025
Merged

fix: prevent session leaks and exhaustion in smart router and consumer#2158
nimrod-teich merged 4 commits into
mainfrom
fix/session-leak-prevention

Conversation

@nimrod-teich

@nimrod-teich nimrod-teich commented Dec 30, 2025

Copy link
Copy Markdown
Contributor

This commit fixes session exhaustion issues that occur under high load (1000+ r/s) with a single provider in smart router mode.

Root causes fixed:

  1. Sessions held for 3 seconds during backoff (rpcconsumer + rpcsmartrouter)

    • Sessions now released immediately on relay failure
    • Backoff no longer holds session locked
  2. OnSessionFailure early return without Free() (consumer_session_manager)

    • When session was already blocklisted, returned without unlocking
    • Session stayed locked forever causing exhaustion
    • Now calls Free() before returning in blocklisted case
  3. Data reliability failures with single provider

    • DR needs at least 2 providers to compare results
    • Added early exit when < 2 providers available
  4. Defer-based cleanup for all exit paths

    • sessionHandled flag prevents double-free
    • Panic recovery ensures session release
  5. Configurable max sessions per provider

    • Added --max-sessions-per-provider flag (default: 1000)

Files modified:

  • protocol/lavasession/consumer_session_manager.go
  • protocol/lavasession/consumer_types.go
  • protocol/lavasession/common.go
  • protocol/common/cobra_common.go
  • protocol/rpcconsumer/rpcconsumer.go
  • protocol/rpcconsumer/rpcconsumer_server.go
  • protocol/rpcsmartrouter/rpcsmartrouter_server.go

Tests added:

  • TestSessionLeakPrevention_* in rpcconsumer

Description

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • read the contribution guide
  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the main branch
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Note

Resolves session exhaustion under load by making session handling robust and configurable.

  • Critical session lifecycle fixes
    • Add defer-based cleanup with sessionHandled and panic recovery in sendRelayToProvider (consumer + smart router)
    • Call OnSessionFailure immediately on relay errors (removed backoff-held sessions)
    • In OnSessionFailure, free session when already blocklisted to avoid deadlocks
  • Data reliability
    • Skip DR when fewer than 2 providers are available
  • Configurability
    • New flags: --max-sessions-per-provider (backs MaxSessionsAllowedPerProvider) and --maximum-streams-per-connection
    • Introduce GetMaxAllowedBlockListedSessionPerProvider; compute blocked-session thresholds using it
  • Connection/session selection
    • Use MaximumStreamsOverASingleConnection in connection reuse checks
  • CLI wiring
    • Wire new flags in rpcconsumer and rpcsmartrouter; centralize flag names in common
  • Tests
    • Add comprehensive TestSessionLeakPrevention_* suites for consumer and smart router (panic, early-return, concurrency, subscriptions)

Written by Cursor Bugbot for commit 3779858. This will update automatically on new commits. Configure here.

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Comment thread protocol/rpcconsumer/rpcconsumer_server.go Outdated
Comment thread protocol/rpcconsumer/rpcconsumer_server.go Outdated
Comment thread protocol/rpcsmartrouter/rpcsmartrouter_server.go Outdated
@nimrod-teich nimrod-teich force-pushed the fix/session-leak-prevention branch 2 times, most recently from c00ae8c to af7f82a Compare December 30, 2025 14:41
Comment thread debug-base-latency.sh Outdated
Comment thread protocol/rpcconsumer/rpcconsumer_server.go Outdated
Comment thread protocol/lavasession/common.go
@github-actions

github-actions Bot commented Dec 30, 2025

Copy link
Copy Markdown

Test Results

    6 files  ± 0    129 suites  ±0   33m 35s ⏱️ - 1m 19s
3 031 tests +30  3 022 ✅ +30  1 💤 ±0  2 ❌ ±0  6 🔥 ±0 
3 111 runs  +30  3 102 ✅ +30  1 💤 ±0  2 ❌ ±0  6 🔥 ±0 

For more details on these failures and errors, see this check.

Results for commit 42872f0. ± Comparison against base commit ac492c9.

♻️ This comment has been updated with latest results.

@nimrod-teich nimrod-teich force-pushed the fix/session-leak-prevention branch 2 times, most recently from 1c7759d to eb63eb6 Compare December 30, 2025 15:13
Comment thread protocol/rpcconsumer/rpcconsumer_server.go Outdated
@nimrod-teich nimrod-teich force-pushed the fix/session-leak-prevention branch 2 times, most recently from c406f55 to 5bd7018 Compare December 30, 2025 15:47
…d (1000+ r/s) with a single provider in smart router mode.

Root causes fixed:

1. Sessions held for 3 seconds during backoff (rpcconsumer + rpcsmartrouter)

   - Sessions now released immediately on relay failure
   - Backoff no longer holds session locked

2. OnSessionFailure early return without Free() (consumer_session_manager)

   - When session was already blocklisted, returned without unlocking
   - Session stayed locked forever causing exhaustion
   - Now calls Free() before returning in blocklisted case

3. Data reliability failures with single provider

   - DR needs at least 2 providers to compare results
   - Added early exit when < 2 providers available

4. Defer-based cleanup for all exit paths

   - sessionHandled flag prevents double-free
   - Panic recovery ensures session release

5. Configurable max sessions per provider

   - Added --max-sessions-per-provider flag (default: 1000)

6. Configurable max streams per connection

   - Added --maximum-streams-per-connection flag (default: 100)

Files modified:

- protocol/lavasession/consumer_session_manager.go
- protocol/lavasession/consumer_types.go
- protocol/lavasession/common.go
- protocol/common/cobra_common.go
- protocol/rpcconsumer/rpcconsumer.go
- protocol/rpcconsumer/rpcconsumer_server.go
- protocol/rpcsmartrouter/rpcsmartrouter.go
- protocol/rpcsmartrouter/rpcsmartrouter_server.go

Tests added:

- TestSessionLeakPrevention_* in rpcconsumer
@nimrod-teich nimrod-teich force-pushed the fix/session-leak-prevention branch from 0acd93e to f6887da Compare December 31, 2025 10:02
nimrod-teich and others added 3 commits December 31, 2025 12:35
…safety in RPC session management

Updated the session handling mechanism in both rpcconsumer and rpcsmartrouter servers to use atomic.Bool instead of a regular boolean. This change ensures thread-safe access to the sessionHandled flag, preventing potential race conditions during session cleanup and management. The updates include initializing the atomic variable and modifying all relevant checks and updates to use the atomic methods.
@nimrod-teich nimrod-teich merged commit 0063b52 into main Dec 31, 2025
17 of 19 checks passed
@nimrod-teich nimrod-teich deleted the fix/session-leak-prevention branch December 31, 2025 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants