Skip to content

fix(ble): Restore bounded bonded reconnect fallback#5960

Merged
jamesarich merged 4 commits into
meshtastic:mainfrom
jeremiah-k:bugfix/ble-bonded-autoconnect-fallback
Jun 26, 2026
Merged

fix(ble): Restore bounded bonded reconnect fallback#5960
jamesarich merged 4 commits into
meshtastic:mainfrom
jeremiah-k:bugfix/ble-bonded-autoconnect-fallback

Conversation

@jeremiah-k

@jeremiah-k jeremiah-k commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Overview

This pull request restores the bonded-device BLE reconnect fallback used when a selected bonded device misses the short fresh-advertisement scan windows.

Before this change, the reconnect path preferred a fresh advertisement before connecting, but treated a bonded device as unreachable if both bounded scans missed. That could push devices into repeated reconnect backoff even though Android still had a viable bonded reconnect path.

This change keeps the fresh-advertisement preference from #5912, but restores the bonded handle as a final bounded fallback. When both fresh scans miss, the transport may connect through the bonded handle so Android's autoConnect path can patiently wait for the device to advertise again.

The trade-off is deliberate: a genuinely unavailable device may spend longer in an individual reconnect attempt, but unlike the pre-#5912 behavior, this fallback is still limited by CONNECTION_TIMEOUT before reconnect policy resumes. In testing, that bounded wait was preferable for real-world reliability on slower devices and older Android Bluetooth stacks, while still avoiding the unbounded hangs that motivated #5912.

Key Changes

  • Kept fresh scanned advertisements preferred for bonded BLE reconnects.
  • Kept the short targeted scan followed by the bounded escalated scan.
  • Restored bonded-handle fallback when both fresh scans miss.
  • Preserved bounded behavior through the existing CONNECTION_TIMEOUT used by connectAndAwait.
  • Left non-bonded BLE discovery behavior unchanged.
  • Updated BLE device documentation for bonded-only fallback behavior.
  • Updated BLE transport tests for the bonded fallback path.
  • Kept coverage proving a fresh scanned device wins when the escalated scan finds the selected bonded device.

Testing

  • Updated BleRadioTransportTest coverage for bonded-device scan misses.
  • Updated BleRadioTransportReconnectCrashTest coverage for targeted-scan miss, escalated-scan miss, and bonded fallback.
  • Verified the fresh scanned device is still preferred when available.

Migration Notes

  • No user data migration is required.
  • Existing BLE addresses and OS bonds are unchanged.
  • Non-bonded BLE discovery behavior is unchanged.
  • The outer BLE reconnect/backoff policy remains unchanged.

@github-actions github-actions Bot added the bugfix PR tag label Jun 25, 2026
@jeremiah-k jeremiah-k marked this pull request as ready for review June 26, 2026 01:28

@jamesarich jamesarich left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: clean PR, no correctness defects found. I traced findDevice → attemptConnection → connectAndAwait → KableBleConnection.connect: the bonded handle's advertisement == null sets autoConnect = true, and the attempt is wrapped in withTimeout(CONNECTION_TIMEOUT = 15s) inside connectAndAwait, so the "bounded by CONNECTION_TIMEOUT" claim holds. Tests are internally consistent (bondedReconnectWindowMillis = settle + 2s + 5s + 1s buffer; the 11_000ms in BleRadioTransportTest matches 3+2+5+1).

Two non-blocking points:

  1. This deliberately reverses #5912's premise — confirm that's intended. #5912 treated this exact bonded-handle fallback as a defect ("forces autoConnect=true and hangs for CONNECTION_TIMEOUT (15s)"); this PR restores it. The cost: a device that is genuinely gone now runs ~10s scan + 15s autoConnect timeout ≈ 25s per reconnect iteration before BleReconnectPolicy backoff re-iterates, versus ~10s before. That's the stated goal (let Android wait patiently via autoConnect), not a bug, but it re-trades the reconnect cadence #5912 optimized. Worth a sentence in the PR body confirming the trade is accepted.

  2. The central guarantee is untested. Both updated tests use the RadioInterfaceService fake whose connectAndAwait returns synchronously. They prove findDevice returns the bonded handle and that connectAndAwait is called once, but the headline safety property — "remains bounded by CONNECTION_TIMEOUT" — is asserted only in a comment, never exercised. Fine for unit scope; just flagging that the PR's central claim has no test behind it.

Bottom line: well-documented, no correctness issues. The real question is the design trade — is re-accepting ~25s/iteration autoConnect park the intended behavior over #5912?

@jeremiah-k

jeremiah-k commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

This deliberately reverses #5912's premise — confirm that's intended.

Yes, that's intentional. After additional testing, particularly on older Android phones, it became clear that removing the bonded-device fallback was too aggressive.

The bounded scan window introduced in #5912 works well when advertisements are seen reliably, but in practice there are legitimate cases where the application simply misses advertisements during that window. On older Android Bluetooth stacks especially, scan results can be delayed or missed even though the device is still nearby and advertising.

Restoring the bonded-device fallback lets Android's native autoConnect path take over only after both explicit scan attempts have failed. In testing, that proved substantially more reliable on those devices because the Bluetooth stack continues waiting for the peripheral instead of requiring the application to rediscover it within a fixed scan budget.

The important difference from the behavior before #5912 is that this is still bounded. The fallback is limited by CONNECTION_TIMEOUT (15 seconds) before reconnect policy resumes. So the trade-off is deliberate: we accept a longer individual reconnect attempt in exchange for significantly better real-world reliability on slower devices and older Android stacks, while still avoiding the unbounded hangs that motivated #5912.

The central guarantee is untested.

That's a fair point. The current tests verify that we fall back to the bonded handle after the bounded scan window and exercise that path, but they don't directly prove the 15-second timeout because the fake transport completes synchronously. Exercising that guarantee would require a different style of test that can control the asynchronous connection lifecycle rather than the current fake implementation.

Edit: fixing that lint-check

@jeremiah-k jeremiah-k marked this pull request as draft June 26, 2026 01:57
@jeremiah-k jeremiah-k marked this pull request as ready for review June 26, 2026 02:41
@jamesarich jamesarich added this pull request to the merge queue Jun 26, 2026
Merged via the queue into meshtastic:main with commit 780bec0 Jun 26, 2026
24 checks passed
@jeremiah-k jeremiah-k deleted the bugfix/ble-bonded-autoconnect-fallback branch June 26, 2026 11:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix PR tag

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants