Skip to content

feat(stability): startup-grace fast reconnect for Passive Mode (#3122)#3128

Merged
Yeraze merged 1 commit into
mainfrom
fix/issue-3122-fast-startup-reconnect
May 21, 2026
Merged

feat(stability): startup-grace fast reconnect for Passive Mode (#3122)#3128
Yeraze merged 1 commit into
mainfrom
fix/issue-3122-fast-startup-reconnect

Conversation

@Yeraze

@Yeraze Yeraze commented May 21, 2026

Copy link
Copy Markdown
Owner

Follow-up to #3125 / #3126. Closes item 3 of the reporter's feedback on #3122 (fast initial reconnect during the startup window).

Why

The reporter observed that a large infrastructure node consistently closes the first config-sync session — but recovers cleanly on the very next attempt. Under normal exponential backoff, MeshMonitor would wait 8–16 s before retrying, which translates to a long user-visible "stuck reconnecting" gap right at app start. The grace window collapses that gap on the path the reporter actually walks (passive-mode sources, fresh process start, first reconnect after the initial sync drops).

What

  • `TcpTransport.setStartupGraceReconnect(graceMs, fastDelayMs)` — for the next `graceMs` after the call, `scheduleReconnect` uses `fastDelayMs` instead of the exponential backoff. After the window expires the legacy backoff resumes automatically. Pass `graceMs=0` to explicitly disable.
  • `MeshtasticManager` opt-in — passive-mode sources call `setStartupGraceReconnect(2 * 60_000, 3_000)` at transport-construct time. Non-passive sources are unchanged; this is strictly additive for the opt-in path.

Constants live as class statics on `MeshtasticManager` (`STARTUP_GRACE_WINDOW_MS`, `STARTUP_GRACE_FAST_DELAY_MS`) — picked the reporter's exact recommended values: 2 min window, 3 s delay.

What's not in this PR

  • A configurable staleness window as an advanced per-source setting (still item 2 from the reporter's feedback). Worth its own follow-up if anyone hits a node where 4 h doesn't fit.

Test plan

  • `npx tsc --noEmit` clean
  • 6 new `tcpTransport.test.ts` tests: default-disabled uses exponential, inside grace uses fast delay regardless of attempt count, post-window falls back to exponential, `graceMs=0` disables, sanity check on tiny fast delays, reconnect actually fires after the configured delay
  • Full suite: 5281 pass / 3 pre-existing fails (`mqttBrokerManager` zero-hop encode failures; reproduce on clean main, unrelated)
  • Manual smoke against a real passive-mode TCP node — verify the recovery reconnect kicks in at ~3 s rather than ~8 s

🤖 Generated with Claude Code

Follow-up to #3125 / #3126. Closes the third item from the reporter's
feedback on #3122.

On large/fragile TCP nodes the *first* config-sync session frequently
closes mid-stream but the second attempt works cleanly. Under normal
exponential backoff that recovery would wait 8–16s before retrying;
this short startup-grace window cuts that gap.

* TcpTransport.setStartupGraceReconnect(graceMs, fastDelayMs): for the
  next graceMs after the call, scheduleReconnect uses fastDelayMs
  instead of the exponential backoff. After the window expires, normal
  backoff resumes automatically. Disabled by default (graceMs=0).
* MeshtasticManager opt-in: passive-mode sources get a 2-minute grace
  window with a 3-second reconnect delay during initial startup. Other
  sources keep the legacy backoff.

Tests: 6 new tcpTransport tests cover default-disabled, in-window
fast delay regardless of attempt count, post-window exponential
fallback, explicit disable, sanity, and that the reconnect fires
after the configured delay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


Pull Request Review

Reviewing the startup-grace fast reconnect implementation for Passive Mode...

Todo List:

  • Read and analyze changed files
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Assess security concerns
  • Analyze test coverage
  • Provide comprehensive feedback

Code Quality & Best Practices ✅

The implementation follows excellent coding practices:

  • Well-documented: Comprehensive JSDoc comments explaining the feature purpose and usage
  • Type safety: Proper TypeScript typing throughout
  • Consistent naming: Constants follow the established STARTUP_GRACE_* pattern
  • Separation of concerns: Logic cleanly separated between transport layer and manager layer
  • Backwards compatible: Grace window disabled by default, only opt-in for passive mode

Implementation Analysis ✅

TcpTransport Changes (src/server/tcpTransport.ts:52-120):

  • New private fields track grace window state: startupGraceUntil, startupGraceFastDelayMs
  • setStartupGraceReconnect() method provides clean API for configuration
  • Logic in scheduleReconnect() (lines 268-271) correctly prioritizes grace delay over exponential backoff
  • Grace window automatically expires based on timestamp comparison

MeshtasticManager Integration (src/server/meshtasticManager.ts:422-423, 1068-1071):

  • Constants defined as class statics for easy configuration
  • Only passive-mode sources opt into the feature
  • Values match reporter's recommendations (2min window, 3s delay)

Bug Analysis ✅

No bugs identified. The implementation is solid:

  1. Grace window expiry: Correctly uses Date.now() comparison to determine if window is active
  2. Attempt counter independence: Grace delay works regardless of reconnect attempt count
  3. Proper fallback: After grace expires, exponential backoff resumes normally
  4. Edge case handling: graceMs=0 properly disables the feature

Performance Considerations ✅

Positive impacts:

  • Reduces user-visible reconnect delays from 8-16s to 3s during startup
  • Only affects passive-mode sources (targeted optimization)
  • Grace window auto-expires, preventing permanent behavior changes

No performance concerns:

  • Minimal memory overhead (2 additional instance variables)
  • No additional computational complexity
  • No impact on steady-state reconnect behavior

Security Assessment ✅

No security concerns:

  • Feature doesn't modify authentication, encryption, or data handling
  • Only affects reconnect timing, not protocol behavior
  • Cannot be exploited to bypass rate limiting (grace window is time-bound)
  • No new attack vectors introduced

Test Coverage Analysis ✅

Excellent test coverage with 6 comprehensive tests (src/server/tcpTransport.test.ts:190-275):

  1. Default behavior: Verifies grace disabled by default
  2. Grace window active: Tests fast delay usage during window
  3. Grace expiry: Confirms fallback to exponential backoff
  4. Disable functionality: Tests graceMs=0 disabling
  5. Edge cases: Handles tiny fast delays appropriately
  6. Integration: Verifies actual reconnect firing

Tests use proper mocking and time manipulation, covering all code paths.

Minor Suggestions

  1. Configuration validation: Consider adding bounds checking in setStartupGraceReconnect():

    if (fastDelayMs < 100 || fastDelayMs > 60000) {
      logger.warn(`⚠️ Startup grace fast delay ${fastDelayMs}ms may be too aggressive/slow`);
    }
  2. Metrics: Could add debug logging to track grace window usage in production

Summary

This is a high-quality implementation that addresses a real user pain point. The code is well-structured, thoroughly tested, and follows established patterns. The feature is appropriately scoped to passive-mode sources where the issue was observed, and the auto-expiring grace window ensures it doesn't affect long-term behavior.

Recommendation: APPROVE

The PR successfully implements the requested fast reconnect feature with excellent code quality, comprehensive testing, and no identified issues.


@Yeraze Yeraze merged commit 6097354 into main May 21, 2026
16 checks passed
Yeraze added a commit that referenced this pull request May 21, 2026
#3122) (#3129)

Follow-up to #3125 / #3126 / #3128. Closes item 2 of the reporter's
feedback on #3122 (advanced per-source setting for the staleness
window).

* SourceConfig gains optional passiveResyncStaleMs (ms). Falls back to
  the 4h class default when absent.
* MeshtasticManager.effectivePassiveResyncStaleMs() resolves the active
  threshold, clamping out-of-range overrides ([1 min, 7 days]) so a 0
  or astronomical value can't disable the safeguard.
* Plumbed through all 5 MeshtasticManager construction sites in
  sourceRoutes; a change in the value triggers a transport restart
  alongside the existing passiveMode toggle.
* UI: numeric "Resync staleness window (hours)" input appears under
  the Passive Mode toggle when enabled. Blank = use 4h default.

Tests: 9 new tests cover default, constructor override, configureSource
override + clear, below-floor / above-ceiling rejection, exact
boundary acceptance, and NaN rejection.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant