Skip to content

fix(meshcore): stop virtual node server during transport teardown#3706

Merged
Yeraze merged 2 commits into
mainfrom
claude/great-dijkstra-5nz4pv
Jun 24, 2026
Merged

fix(meshcore): stop virtual node server during transport teardown#3706
Yeraze merged 2 commits into
mainfrom
claude/great-dijkstra-5nz4pv

Conversation

@Yeraze

@Yeraze Yeraze commented Jun 24, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #3705 — MeshCore virtual node stuck in "Connecting" when the real node connection drops.

  • When the real MeshCore node loses connectivity, teardownTransportOnly() sets isConnected() = false but left the virtual node server running. Every AppStart from the mobile app got BadState → app looped in "Connecting".
  • Added await this.stopVirtualNodeServer() at the top of teardownTransportOnly(), mirroring what the full disconnect() path already does. The server restarts in connect() once the real node is back.

Test plan

  • Verify existing virtual node unit tests still pass (src/server/meshcoreVirtualNodeServer.test.ts)
  • Manual: connect mobile app to virtual node with a stable real-node connection → should work as before
  • Manual: simulate real-node disconnect (unplug/stop the device) → virtual node server stops (app drops); reconnect real node → virtual node restarts; app can connect again

🤖 Generated with Claude Code

https://claude.ai/code/session_01LBEuSU2sHGZS3xNiXshTxm


Generated by Claude Code

)

When the real MeshCore node connection drops and the manager enters
reconnect state via teardownTransportOnly(), the virtual node server was
left running while isConnected() returned false. Any AppStart from a
connecting app client got a BadState error, causing the app to loop in
"Connecting" indefinitely.

Mirror what disconnect() already does: stop the virtual node server
before marking the transport as down. The server is restarted in
connect() once the real node is back.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LBEuSU2sHGZS3xNiXshTxm
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review

  • Gather context
  • Read changed files
  • Analyze fix logic
  • Provide review feedback

Summary

This is a clean, minimal, and well-targeted fix. The root cause is clear: teardownTransportOnly() mirrored the native backend teardown from disconnect() but omitted the stopVirtualNodeServer() call, leaving the virtual node server accepting mobile app connections while isConnected() was false. The fix correctly mirrors the disconnect() pattern.

Code Quality

src/server/meshcoreManager.ts:4064-4087 — The fix is correct and idiomatic:

  • Placement at the top of teardownTransportOnly() mirrors disconnect() (line 841) exactly.
  • The inline comment (lines 4065–4068) clearly explains the "why" — the BadState loop — which will help future readers.
  • stopVirtualNodeServer() is already null-guarded (if (!this.virtualNodeServer) return; at line 796), so calling it during a reconnect where no virtual node is configured is harmless.

Potential Issues / Edge Cases

1. Reconnect gap vs. virtual node restart

When beginReconnect() fires (line 4057), the sequence is:

  1. teardownTransportOnly() → stops virtual node server
  2. scheduleNextReconnect() → waits delay ms (exponential backoff, up to 60 s by default)
  3. attemptReconnect() → calls connect() → calls startVirtualNodeServer()

During the backoff window the virtual node server is down, so any mobile app that tries to connect will get a TCP connection refused rather than a BadState loop. This is the correct UX — it's actually an improvement over the pre-fix behavior (infinite BadState loop). Worth noting in release notes, but not a blocker.

2. void this.teardownTransportOnly().then(...) (line 4057)

The void here suppresses the promise. If stopVirtualNodeServer() throws an unhandled error, it will be a silent failure that won't prevent scheduleNextReconnect() from being called (since the .then() is attached). The stopVirtualNodeServer() implementation (lines 795–802) has a try/catch that swallows errors, so this is already handled — no issue.

3. disconnect() redundancy

disconnect() (line 841) also calls stopVirtualNodeServer(). Since teardownTransportOnly() now stops it first, calling disconnect() on a node already in the reconnecting state would call stopVirtualNodeServer() twice. The null-guard at line 796 (if (!this.virtualNodeServer) return;) makes this idempotent — no issue.

Performance / Security

No performance or security concerns. This is a state-machine correctness fix with no new I/O paths or privilege changes.

Test Coverage

The PR description lists manual test steps but no automated test changes. The existing meshcoreVirtualNodeServer.test.ts suite validates the server's own start/stop behavior. A unit test for teardownTransportOnly() verifying that stopVirtualNodeServer() is called (e.g. via a spy) would strengthen the regression coverage, but given the simplicity and risk profile of this change, absence of a new test is not a blocker.

Verdict

Looks good to merge. The fix is correct, minimal, well-commented, and consistent with the existing disconnect() pattern. No blocking issues found.


Adds a focused regression test for the PR #3706 fix (issue #3705): when the
real node connection drops, teardownTransportOnly() must stop the Virtual Node
server. Without this, the VN server kept accepting MeshCore mobile-app
connections while isConnected() was false, returning BadState to every AppStart
and trapping the app in a "Connecting" loop.

The test spies on stopVirtualNodeServer and asserts it is invoked during
teardownTransportOnly(), verifies it runs before the native backend teardown
(ordering matters — the VN server reads connection state to answer AppStart),
and exercises the real stop() delegation. Confirmed to fail without the
one-line fix and pass with it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

Claude finished @Yeraze's task —— View job


PR Review

  • Gather context
  • Read changed files
  • Analyze fix logic and tests
  • Provide review feedback

Summary

This PR adds 6 lines to fix the root cause (stopping the VN server in teardownTransportOnly()) and then backs it with a dedicated test file covering all meaningful paths. Both changes are clean and correct.


Fix (src/server/meshcoreManager.ts)

The one-liner fix at meshcoreManager.ts:4069 is exactly right:

await this.stopVirtualNodeServer();
  • Mirrors the disconnect() code path identically.
  • stopVirtualNodeServer() (lines 795–803) is null-guarded and swallows stop errors, so calling it when no VN server is configured is a no-op.
  • The inline comment (lines 4065–4068) clearly explains the BadState loop — good for future readers.
  • No performance or security concerns.

Tests (src/server/meshcoreManager.virtualNode.test.ts)

The new test file adds three focused regression tests. All three are well-structured:

Test 1 — "stops the Virtual Node server when tearing down transport"
Verifies the primary regression: stopVirtualNodeServer() is called exactly once. Clean and minimal.

Test 2 — "tears down native backend after stopping VN server"
This is the strongest test — it verifies ordering using invocationCallOrder, ensuring stopVirtualNodeServer runs before nativeBackend.disconnect. It also validates post-teardown state (connected === false, nativeBackend === null). Good depth.

Test 3 — "delegates to VN server stop() through stopVirtualNodeServer()"
Reaches through to the actual stop() call on the underlying server object and verifies virtualNodeServer is nulled out afterward. This tests the concrete teardown contract, not just the spy boundary.

One minor observation on test 3: The test sets isRunning: () => true on the mock but stopVirtualNodeServer() doesn't actually check isRunning() before calling stop() (it only checks if (!this.virtualNodeServer)). This is harmless — the property is ignored — but it could slightly mislead future readers. Not a blocker.


Verdict

Looks good to merge. The fix is correct, minimal, and well-documented. The new test file addresses the gap noted in the prior review and validates both the call order and the post-teardown state. No blocking issues.


@Yeraze Yeraze merged commit b41594e into main Jun 24, 2026
25 of 29 checks passed
@Yeraze Yeraze deleted the claude/great-dijkstra-5nz4pv branch June 24, 2026 15:06
Yeraze added a commit that referenced this pull request Jun 26, 2026
…ter-disconnect (#3705) (#3794)

* fix(meshcore): detect socket drops + add user-configurable heartbeat (#3705)

Two gaps left a MeshCore source unable to recover from a dropped real-node
link without a manual reconnect or container restart:

1. The manager ignored the native backend's own 'disconnected' event (it only
   logged it), so a socket/serial-level drop left `connected = true`.
   isConnected() kept returning true, so the Virtual Node server answered
   AppStart with a stale SelfInfo while real sends silently failed, and nothing
   recovered. handleUnexpectedDisconnect() now reflects reality (drops to
   disconnected + stops the VN server) or, when auto-reconnect is enabled, hands
   off to the existing backoff machinery. An intentionalTeardown guard plus a
   stale-backend-instance check keep it from firing on our own teardowns.

2. Heartbeat/auto-reconnect was unreachable: neither the source config type nor
   meshcoreConfigFromSource() carried heartbeatIntervalSeconds, and there was no
   UI for it — so startHeartbeat() always early-returned and the reconnect path
   (incl. PR #3706's fix) never ran. Added a heartbeat field to the MeshCore
   source form (mirroring Meshtastic), plumbed heartbeatIntervalSeconds through
   MeshCoreSourceConfig and meshcoreConfigFromSource(). The update route already
   restarts the manager on any config change, so editing it reconnects.

Regression tests cover the drop-detection branches and the config passthrough.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

* fix(meshcore): keep manager registered on manual disconnect (#3705)

The UI Disconnect (POST /api/sources/:id/disconnect) called
meshcoreManagerRegistry.remove(), deleting the manager from the registry.
Every /api/sources/:id/meshcore/* route is behind a guard that 404s with
"No MeshCore manager for source <id>" when the manager is absent — so after a
manual disconnect the page's status/read polling errored out and the source
couldn't be driven again without a container restart.

Disconnect now tears down the device link via manager.disconnect() (which
already stops the VN server, heartbeat and schedulers) while leaving the
manager registered, so /meshcore/* keeps serving a clean disconnected state and
/connect re-establishes against the existing manager. Mirrors the intent of the
autoConnect=false stop/start workflow without making the source unaddressable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

* docs(meshcore): document source heartbeat field + changelog (#3705)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

* fix(meshcore): release dead backend on unrecovered drop (review #3794)

Addresses Claude review finding #2: when a socket drop is handled with
auto-reconnect disabled, handleUnexpectedDisconnect() left this.nativeBackend
pointing at the closed connection, so sendBridgeCommand()'s `!nativeBackend`
guard never tripped and callers got a write-to-closed error instead of a clean
"disconnected" until /connect ran. Now tears down and nulls the dead backend
(connectionState is set to 'disconnected' first so any re-emitted event
short-circuits). Added a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_011JEaCGwY9Wz8jeV4e22GW4

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] MeshCore: Cannot connect to virtual node

2 participants