Skip to content

fix(network): preserve TCP reconnect backoff on short sessions#5893

Merged
jamesarich merged 2 commits into
meshtastic:mainfrom
jeremiah-k:bugfix/tcp-reconnect-backoff
Jun 21, 2026
Merged

fix(network): preserve TCP reconnect backoff on short sessions#5893
jamesarich merged 2 commits into
meshtastic:mainfrom
jeremiah-k:bugfix/tcp-reconnect-backoff

Conversation

@jeremiah-k

@jeremiah-k jeremiah-k commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Overview

This PR fixes TCP reconnect behavior for radios that briefly accept a PhoneAPI TCP connection, send initial/config data, and then close the socket before the session is actually stable. Previously, any received data reset reconnect backoff to the 1-second floor, so repeated short sessions could hammer a sleeping or recovering radio at roughly 1 Hz. The fix only resets reconnect backoff after a data-bearing session has remained connected long enough to be treated as stable.

Key Changes

Fixes

  • Preserved growing TCP reconnect backoff after short data-bearing sessions.
  • Added a minimum stable-session threshold before resetting reconnect delay.
  • Prevented config-dump-then-EOF sessions from repeatedly resetting backoff to 1 second.
  • Kept fast recovery for genuinely stable sessions by still resetting backoff after data is exchanged and the session uptime meets the threshold.
  • Left no-data failures on the existing exponential backoff path.

Refactors

  • Added SHORT_SESSION_THRESHOLD_MS to document the stable-session cutoff.
  • Extracted the reconnect reset decision into shouldResetBackoff(...).
  • Moved reconnect reset logic from “any data received” to “data received and session uptime meets the stability threshold.”
  • Expanded reconnect logging to distinguish stable sessions that reset backoff from short sessions that keep the current backoff.

Testing

  • Added JVM unit coverage for the reconnect backoff reset policy via ShouldResetBackoffTest.
  • Covered the reset decision without socket/integration setup:
    • no data => no reset
    • data + uptime below threshold => no reset
    • data + uptime at threshold => reset
    • data + uptime above threshold => reset
  • No socket/integration tests are included; the branch keeps transport-loop behavior unchanged apart from the tested reset decision.

Breaking changes / migration

No breaking API or user-facing migration steps are required. Changes are internal to TCP transport reconnect behavior.

TcpTransport.connectWithRetry reset the reconnect backoff to the minimum
(1 second) whenever any data was received during a session, even if the
session lasted only 1-2 seconds before the radio closed the connection
(EOF). This is common with ESP32 radios that enter light sleep without a
BLE session: the firmware dumps its Stage-1 config, then immediately
closes the TCP PhoneAPI session. Treating this as a successful data
exchange caused the app to hammer the sleeping radio at ~1 Hz, which
prevented the firmware's WiFi/PhoneAPI stack from recovering and
eventually made the radio stop responding to new connections entirely.

Only reset backoff when the session lasted at least 30 seconds
(SHORT_SESSION_THRESHOLD_MS). Short sessions that ended in peer-EOF keep
the backoff growing (1s -> 2s -> 4s -> 8s -> ...), giving the radio time
to settle between reconnect attempts.
Extract the reconnect-backoff reset logic into a pure internal helper
(shouldResetBackoff) so it can be unit-tested without sockets. Covers
the four policy boundaries: no data, data below threshold, data at
threshold, and data above threshold.
@github-actions github-actions Bot added the bugfix PR tag label Jun 21, 2026
@jeremiah-k jeremiah-k marked this pull request as ready for review June 21, 2026 22:04
@jamesarich jamesarich added this pull request to the merge queue Jun 21, 2026
Merged via the queue into meshtastic:main with commit bf2338c Jun 21, 2026
23 checks passed
@jeremiah-k jeremiah-k deleted the bugfix/tcp-reconnect-backoff branch June 21, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix PR tag

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants