Skip to content

fix(stability): per-source Passive Mode + createdAt chat ordering (#3122)#3125

Merged
Yeraze merged 1 commit into
mainfrom
fix/issue-3122-passive-mode
May 21, 2026
Merged

fix(stability): per-source Passive Mode + createdAt chat ordering (#3122)#3125
Yeraze merged 1 commit into
mainfrom
fix/issue-3122-passive-mode

Conversation

@Yeraze

@Yeraze Yeraze commented May 21, 2026

Copy link
Copy Markdown
Owner

Fixes #3122.

Two related fixes for large/fragile Meshtastic TCP nodes, in line with the plan agreed in the issue discussion and refined by the reporter (TheWISPRer).

Track A — channel chat orders by createdAt (global, no flag)

Channel chat sort + cursor now use server DB arrival time (createdAt) instead of device-reported rxTime/timestamp. A node with a future-skewed clock can no longer pin an old message at the visible "newest" slot and hide subsequent traffic.

Touched:

  • MessagesRepository.getMessages, getMessagesByChannel, getMessagesBeforeInChannel (cursor + ORDER BY), getDirectMessages, getMessagesSqlite, getMessagesByChannelSqlite — six sites.
  • /api/unified/messages: response now includes createdAt, sort + before cursor use it. Doc-comment updated.
  • UnifiedMessagesPage: client sort + infinite-query cursor switched to createdAt.
  • searchMessages date-range filtering intentionally left on rxTime — searching by date implies the user means sent time.

Trade-off (noted by the reporter and acceptable for a live monitoring UI): store-and-forward messages that arrive long after they were sent display at receive time, not original send time.

Track B — opt-in per-source Passive Mode

A new passiveMode flag on Meshtastic TCP sources (UI toggle on the source edit dialog). Default off — small nodes keep the existing handshake. When on:

  • handleDisconnected preserves localNodeInfo / actualDeviceConfig / actualModuleConfig / initConfigCache across socket bounces. With a Virtual Node attached, the init capture buffer is still cleared so VN replay stays fresh.
  • On reconnect, if a cached snapshot exists and the disconnect was less than 4 hours ago, skip sendWantConfigId() — the source of the repeated NodeDB resync loops the reporter observed on a ~1183-node router. Mesh traffic flows without it.
  • Skip the post-config outbound burst when passive: requestConfig(LoRa), requestAllModuleConfigs, startRemoteAdminScanner, startTimeSyncScheduler. Receive-only schedulers (geofence, local stats, auto-favorite sweep, etc.) keep running.
  • Tooltip in the UI: "Reduces outbound requests to large or fragile TCP nodes. Preserves cached config across reconnects and skips post-config device requests. Recommended for router-class nodes with large NodeDBs."

The 4-hour staleness window matches the reporter's recommendation.

Out of scope (follow-ups suggested by the reporter)

  • Manual Resync button with single-flight guard, ~30 s cooldown, and a watchdog timeout. Reporter asked for this so an operator can force want_config_id even when the cache is fresh; recovery after the forced sync should not auto-repeat. Worth a separate PR.
  • Configurable staleness window as an advanced per-source setting (currently a static 4 h class constant).
  • Fast initial reconnect during a startup grace window (2 min, 3–5 s delay). Removed from this PR after a clean reconnect-timing wiring proved too noisy to fit alongside the rest of the change.

Test plan

  • npx tsc --noEmit — clean
  • New tests: src/db/repositories/messages.createdAt-ordering.test.ts (5 tests covering both async + sync entry points + the cursor filter), src/server/meshtasticManager.passiveMode.test.ts (8 tests covering config wiring + state retention + the staleness-window constant + the lastDisconnectAt watermark)
  • Updated mkMsg test helper in unifiedRoutes.test.ts to default createdAt from rxTime/timestamp so the ~30 existing fixture-based ordering tests keep their intent under the new sort key.
  • Updated messages.exclude-portnums.test.ts insert helper to use rxTime as createdAt for deterministic ordering.
  • npx vitest run — 5264 pass / 3 pre-existing fails (mqttBrokerManager.test.ts zero-hop injection — encode failed, reproduces on clean main with no diff applied; unrelated to this change).
  • Manual smoke against a real large TCP node (reporter has a working POC and offered to soak this PR against the same node).

🤖 Generated with Claude Code

)

Two related fixes for large/fragile Meshtastic TCP nodes reported in #3122.

Track A — channel chat orders by server arrival time (global):
* Replace COALESCE(rxTime, timestamp) with createdAt in all message
  repository ORDER BY sites and the unified channel cursor, so a
  future-skewed device clock can no longer pin an old message at the
  visible "newest" slot and hide subsequent traffic. Updates the unified
  API + client to carry createdAt and use it for sort + pagination.

Track B — opt-in per-source Passive Mode:
* New passiveMode flag on Meshtastic TCP sources (UI toggle in the source
  edit dialog) for router-class nodes with large NodeDBs.
* When ON, handleDisconnected preserves localNodeInfo / actualDeviceConfig
  / actualModuleConfig / initConfigCache across socket bounces.
* On reconnect with a fresh (<4h) cached snapshot, skip the want_config_id
  handshake — the source of repeated NodeDB resync loops that the reporter
  observed on a ~1183-node router.
* Skip the post-config outbound burst (LoRa config request,
  all-module-configs request, remote admin scanner, time sync scheduler)
  when passiveMode is on. Receive-side processing is untouched.
* Default remains the existing behavior; large nodes opt in.

Tests: createdAt ordering tests across all repo entry points (including
the sync SQLite variants); passive-mode regression tests for config
wiring + state retention + the 4h staleness window default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented May 21, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review: Passive Mode & Message Ordering

Comprehensive review of this PR covering code quality, potential issues, performance, security, and test coverage.

Review Tasks

  • Read repository guidelines and context
  • Examine message ordering changes (Track A)
  • Review passive mode implementation (Track B)
  • Analyze new test coverage
  • Check API and UI changes
  • Provide final recommendations

Track A Review: Message Ordering (createdAt vs rxTime)

✅ Architecture Alignment: The change from rxTime/timestamp to createdAt ordering follows the multi-source architecture principles correctly. All queries properly scope by sourceId and use Drizzle ORM consistently.

✅ Code Quality: The implementation is clean and well-documented. Comments clearly explain the rationale for the change (messages.ts:97-104). The affected methods are consistently updated across both async and SQLite-only sync variants.

✅ Performance: Using createdAt for ordering is actually better for performance since it's set at insertion time and doesn't require device clock synchronization checks.

⚠️ Search Inconsistency: The searchMessages method still uses COALESCE(rxTime, timestamp) for date filtering (messages.ts:856-861), creating a subtle inconsistency where search by date range filters by sent time while ordering is by received time. This is noted as intentional in the PR description but could be confusing for users.

Track B Review: Passive Mode Implementation

✅ Configuration Handling: The passive mode flag is properly threaded through:

  • Constructor parameter in MeshtasticManager (meshtasticManager.ts:790)
  • configureSource() method (meshtasticManager.ts:573-582)
  • UI form state management (DashboardPage.tsx:122)
  • Proper default to false when not specified

✅ State Preservation Logic: The handleDisconnected method correctly preserves critical state when passiveMode is true (lines 1393+):

  • localNodeInfo, actualDeviceConfig, actualModuleConfig, initConfigCache
  • Clears Virtual Node buffer only when VN is enabled (good separation of concerns)

✅ Staleness Window: 4-hour threshold is reasonable and matches issue discussion. Logic correctly skips sendWantConfigId() when cache is fresh (meshtasticManager.ts:1164-1169).

✅ Selective Outbound Suppression: Lines 1275-1279 correctly skip the outbound burst while preserving receive-only schedulers.

✅ Manager Restart Logic: Source route properly restarts managers when passiveMode changes (sourceRoutes.ts:428), ensuring the new policy takes effect cleanly.

UI Implementation

✅ Clear User Interface: The passive mode toggle in DashboardPage.tsx includes:

  • Clear checkbox label ("Passive Mode")
  • Helpful tooltip explaining the feature's purpose and use case
  • Proper state management with form persistence

Test Coverage Analysis

✅ Comprehensive New Tests:

  1. messages.createdAt-ordering.test.ts: 5 focused tests covering:

    • Async and sync entry points
    • Cursor-based pagination behavior
    • Proper ordering under future-skewed clocks
  2. meshtasticManager.passiveMode.test.ts: 8 tests covering:

    • Configuration wiring through constructor and configureSource()
    • State retention vs clearing behavior
    • Staleness window constants and logic
    • Disconnect timestamp tracking

✅ Existing Test Compatibility: The mkMsg helper update ensures existing tests maintain their intent under the new ordering.

Security & Performance Considerations

✅ No Security Issues: The changes don't expose new attack vectors or sensitive data.

✅ Performance Improvements:

  • createdAt ordering is more efficient than rxTime/timestamp
  • Passive mode reduces network chatter for large nodes

Minor Issues

⚠️ Code Readability: The passiveResyncFresh boolean logic (meshtasticManager.ts:1164-1169) is complex and could benefit from extraction to a helper method.

Final Recommendations

✅ Approve with Minor Suggestions:

  1. Consider documentation: Add a brief note in user-facing docs about the search date filtering vs display ordering difference.

  2. Optional refactor: Extract the passive mode cache freshness check into a named helper method for better readability.

  3. Manual testing: The PR author mentions offering to test against a real large TCP node - this would be valuable validation.

Overall Assessment: This is a well-implemented, thoroughly tested fix that addresses a real operational issue with large Meshtastic nodes. The changes are architecturally sound, follow project conventions, and include comprehensive test coverage. The trade-offs are clearly documented and acceptable for the target use case.


@Yeraze Yeraze merged commit 64afa54 into main May 21, 2026
19 checks passed
@Yeraze Yeraze deleted the fix/issue-3122-passive-mode branch May 21, 2026 18:06
Yeraze added a commit that referenced this pull request May 21, 2026
#3128)

Follow-up to #3125 / #3126. Closes the third item from the reporter's
feedback on #3122.

On large/fragile TCP nodes the *first* config-sync session frequently
closes mid-stream but the second attempt works cleanly. Under normal
exponential backoff that recovery would wait 8–16s before retrying;
this short startup-grace window cuts that gap.

* TcpTransport.setStartupGraceReconnect(graceMs, fastDelayMs): for the
  next graceMs after the call, scheduleReconnect uses fastDelayMs
  instead of the exponential backoff. After the window expires, normal
  backoff resumes automatically. Disabled by default (graceMs=0).
* MeshtasticManager opt-in: passive-mode sources get a 2-minute grace
  window with a 3-second reconnect delay during initial startup. Other
  sources keep the legacy backoff.

Tests: 6 new tcpTransport tests cover default-disabled, in-window
fast delay regardless of attempt count, post-window exponential
fallback, explicit disable, sanity, and that the reconnect fires
after the configured delay.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Yeraze added a commit that referenced this pull request May 21, 2026
#3122) (#3129)

Follow-up to #3125 / #3126 / #3128. Closes item 2 of the reporter's
feedback on #3122 (advanced per-source setting for the staleness
window).

* SourceConfig gains optional passiveResyncStaleMs (ms). Falls back to
  the 4h class default when absent.
* MeshtasticManager.effectivePassiveResyncStaleMs() resolves the active
  threshold, clamping out-of-range overrides ([1 min, 7 days]) so a 0
  or astronomical value can't disable the safeguard.
* Plumbed through all 5 MeshtasticManager construction sites in
  sourceRoutes; a change in the value triggers a transport restart
  alongside the existing passiveMode toggle.
* UI: numeric "Resync staleness window (hours)" input appears under
  the Passive Mode toggle when enabled. Blank = use 4h default.

Tests: 9 new tests cover default, constructor override, configureSource
override + clear, below-floor / above-ceiling rejection, exact
boundary acceptance, and NaN rejection.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Large Meshtastic TCP node repeatedly disconnects during/after full config sync; passive per-source mode improves stability

1 participant