Skip to content

fix(ble): unblock reconnect + kable audit (logging, priority, backoff, StateFlow)#5222

Merged
jamesarich merged 7 commits into
mainfrom
chore/ble-logging-config
Apr 22, 2026
Merged

fix(ble): unblock reconnect + kable audit (logging, priority, backoff, StateFlow)#5222
jamesarich merged 7 commits into
mainfrom
chore/ble-logging-config

Conversation

@jamesarich

@jamesarich jamesarich commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

A focused pass over our Kable BLE integration. Configurable logging, perf/stability wins from a deep audit of KableBleConnection + KableMeshtasticRadioProfile, and a fix for a long-standing reconnect hang that prevented recovery from node power-cycles.

Empirically the handshake (post-connect nodeinfo dump) completes noticeably faster on real hardware, and reconnect now reliably recovers within ~8 s of a remote drop.

Reconnect hang fix (18d1c0c8d)

attemptConnection wrapped its disconnect-watcher in coroutineScope { connectionState.onEach{...}.launchIn(this); connectionState.first { Disconnected } }. coroutineScope waits for all launched children before returning, but a launched collector on a hot StateFlow never completes naturally. Once .first returned, the scope hung forever, blocking BleReconnectPolicy from issuing the next attempt — so a node power-cycle (the canonical "stable connection then remote drop" path) silently never reconnected.

Replaced with a plain connectionState.filterIsInstance<Disconnected>().first() plus an explicit first { Connected } gate beforehand to defeat the StateFlow stale-value race on attempt #2+. New regression test transport reconnects after a stable connection is dropped remotely exercises the full happy-path reconnect cycle that no existing test covered.

Logging (c22e23be6)

  • BleLoggingConfig wrapper around Kable's Logging.Level / Logging.Format so DI consumers don't depend on Kable's API surface.
  • DI provides Debug (Events) in debug builds and Release (Warnings) in release; previously every build emitted Kable's per-event INFO/DEBUG spam.
  • Switched from Multiline (5 lines per event) to Compact (one line) — much friendlier for adb logcat / grep / bug reports.
  • Address-tagged MTU log so multi-device sessions are unambiguous.
  • Per-packet hex dumps are now a one-line flip: BleLoggingConfig.Debug.copy(level = BleLogLevel.Data).

Connection quality (fe9b77550)

  • requestHighConnectionPriority() right after MTU exchange. Drops the connection interval to ~7.5–15 ms during the nodeinfo burst — usually the single biggest win on handshake latency.
  • BleRetry rewritten: exponential backoff (×2, cap 2 s) with ±25 % jitter; default initial delay 500 ms → 250 ms. Avoids thundering-herd reconnects after firmware drops.
  • LE / 1M PHY default documented in KablePlatformSetup.

Reactive surface (fe9b77550)

  • BleConnection.deviceFlow and connectionState migrated SharedFlowStateFlow. Subscribers get the current value immediately on collect — no more missed-edge races between connect and observer registration.
  • Distinct-equals semantics avoid spurious re-emits.
  • FakeBleConnection and CancellingProfileBleConnection updated to match.

Hot-path micro-perf (fe9b77550)

  • Cache toRadioWriteType once per service instead of recomputing per packet.
  • triggerDrain latched to capacity = 1 (was 64) — drain is idempotent so one pending tick is sufficient.
  • Swallowed logRadio errors now logged at debug instead of silently dropped.
  • Clearer autoConnect "giving up" log message.

Code-review fixes (687e1e3b9)

  • BleRetry: dropped a hand-rolled pow() shim and use kotlin.math.pow (the shim's "JVM-only" justification was wrong — it's part of the common stdlib and already used elsewhere in commonMain).

Reverted

  • 2edb98baa (Peripheral reuse on reconnect) and the activeDevice field that supported it (5c744fcc6). Real-hardware testing showed that after a node power-cycle, calling peripheral.connect() on a reused Kable Peripheral whose underlying BluetoothGatt lost its remote does not recover the link. Kable's SensorTag sample creates a fresh Peripheral per reconnect for exactly this reason — peripheral reuse is only safe for cooperative drops, not hard remote teardowns, which is the case we most need to recover from.

Verification

  • ./gradlew :core:ble:allTests :core:network:allTests :core:testing:allTests
  • ./gradlew :core:ble:detekt :core:network:detekt :core:testing:detekt
  • ./gradlew :core:ble:spotlessApply :core:network:spotlessApply :core:testing:spotlessApply
  • ./gradlew :app:testFdroidDebugUnitTest --tests "*KoinVerificationTest*" ✅ (Koin full-graph A3 VerifyModule)
  • Verified on real hardware (Pixel 9 Pro + ESP32 node): handshake / nodeinfo dump completes faster post-connect; reconnect after node power-cycle recovers cleanly within ~8 s.

…agged

- Introduce BleLoggingConfig (BleLogLevel + BleLogFormat) decoupled from
  Kable's API surface so Koin/test classpaths don't need Kable types.
- Provide via Koin from BuildConfigProvider.isDebug: Events in debug,
  Warnings in release. Format defaults to Compact (single-line) which is
  much friendlier for adb logcat / grep than Kable's default Multiline.
- Inject the config into KableBleConnectionFactory and KableBleScanner
  instead of the previously hard-coded Events/Multiline pair.
- Tag the Android MTU negotiation log with the device address so
  multi-device sessions are unambiguous.
- KoinVerificationTest: declare the wrapping enums as known extra types.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions Bot added the enhancement New feature or request label Apr 22, 2026
@codecov

codecov Bot commented Apr 22, 2026

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1690 1 1689 0
View the top 1 failed test(s) by shortest run time
org.meshtastic.desktop.di.DesktopKoinTest::verify desktop koin modules
Stack Traces | 8.13s run time
org.koin.test.verify.MissingKoinDefinitionException: Missing definition for '[field:'level' - type:'org.meshtastic.core.ble.BleLogLevel']' in definition '[Singleton: 'org.meshtastic.core.ble.BleLoggingConfig']'.
	at org.koin.test.verify.Verification.verifyFactory(Verification.kt:99)
	at org.koin.test.verify.Verification.verify(Verification.kt:59)
	at org.koin.test.verify.Verify.verify(VerifyModule.kt:81)
	at org.koin.test.verify.VerifyModuleKt.verify(VerifyModule.kt:30)
	at org.koin.test.verify.VerifyModuleKt.verify$default(VerifyModule.kt:30)
	at org.meshtastic.desktop.di.DesktopKoinTest.verify desktop koin modules(DesktopKoinTest.kt:37)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:115)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestExecutor.runRequest(JUnitTestExecutor.java:175)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestExecutor.accept(JUnitTestExecutor.java:84)
	at org.gradle.api.internal.tasks.testing.junit.JUnitTestExecutor.accept(JUnitTestExecutor.java:47)
	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestDefinitionProcessor.processTestDefinition(AbstractJUnitTestDefinitionProcessor.java:65)
	at org.gradle.api.internal.tasks.testing.SuiteTestDefinitionProcessor.processTestDefinition(SuiteTestDefinitionProcessor.java:53)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at org.gradle.internal.dispatch.MethodInvocation.invokeOn(MethodInvocation.java:77)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:28)
	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:19)
	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:88)
	at jdk.proxy1/jdk.proxy1.$Proxy4.processTestDefinition(Unknown Source)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:178)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:126)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:103)
	at org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:63)
	at org.gradle.process.internal.worker.child.ActionExecutionWorker.execute(ActionExecutionWorker.java:56)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:122)
	at org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:72)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
	at worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

jamesarich and others added 2 commits April 22, 2026 16:57
Kable optimization audit improvements (Phase A+B of full audit).

Connection quality:
- Request high connection priority (CONNECTION_PRIORITY_HIGH) right
  after the MTU exchange — meaningful throughput uplift on Android.
- Document the rationale for the LE / 1M PHY default in
  KablePlatformSetup so future tuners know the constraints.

Retry behavior:
- BleRetry now does exponential backoff (factor 2, cap 2000ms) with
  ±25% jitter; default initial delay lowered 500ms→250ms. Avoids
  thundering-herd reconnects after firmware drops.

Reactive surface:
- BleConnection.deviceFlow / connectionState are now StateFlow, with
  distinct-equals semantics. Subscribers get the current value
  immediately on collect — no more missed-edge races on connect.
- FakeBleConnection / CancellingProfileBleConnection updated to match.

Profile / write path:
- Cache toRadioWriteType once per service instead of recomputing per
  packet; halves the property reads on the hot send path.
- Latch triggerDrain to capacity=1 (was 64) — drain is idempotent so
  one pending tick is sufficient.
- Log swallowed logRadio errors at debug instead of silently dropping.

Misc:
- Clarify the autoConnect 'giving up' log message so it's clear we
  fall back to scan-and-connect on Android's autoConnect ceiling.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When firmware drops the link, BleRadioTransport's outer loop calls
connectAndAwait() again with the same address. Previously, every call
recreated the Kable Peripheral — tearing down its per-peripheral
coroutine scope, Bluetooth-state observer, and (on Android) the
underlying BluetoothGatt — only to immediately rebuild it.

Now we reuse the existing peripheral when:
  - we already own one,
  - the caller didn't supply a fresh advertisement (a fresh adv carries
    scan timing/RSSI that we'd rather feed into a new peripheral), and
  - the address matches the cached peripheralAddress.

The state observer job is left running in this path since the peripheral
keeps emitting through it; we only restart the observer when we install
a new peripheral. disconnect() still does a full close (peripheral +
ActiveBleConnection clear + address unset), preserving the existing
contract that BleRadioTransport.close() and the profile-error paths
fully release GATT resources.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesarich jamesarich changed the title chore(ble): make Kable log level configurable, single-line, address-tagged perf(ble): kable audit — connection priority, peripheral reuse, backoff, StateFlow Apr 22, 2026
jamesarich and others added 2 commits April 22, 2026 17:13
- KableBleConnection: when reusing a peripheral, the long-lived stateJob
  observer kept a reference to the MeshtasticBleDevice instance captured
  at the previous connect() call. If the bonded-device cache swapped the
  instance for the same address (e.g. firmware-reported name change),
  every subsequent state transition was dispatched to the stale device,
  leaving the new instance's _state StateFlow stuck at Disconnected.
  Capture the active device in a @volatile field that the observer reads
  on each emission instead.

- BleRetry: drop the hand-rolled pow() shim and use kotlin.math.pow.
  The shim's justifying comment was wrong — kotlin.math.pow is part of
  the common stdlib and is already used elsewhere in commonMain
  (NumberFormatter, LocationUtils, PositionPrecisionPreference).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Manually reverts 2edb98b and the activeDevice field added in
687e1e3 to support it.

Real-hardware test: power-cycling the node leaves the reused Kable
Peripheral holding a stale BluetoothGatt whose remote is gone, and
peripheral.connect() on it fails to recover the link. Kable's
SensorTag sample creates a fresh Peripheral per reconnect for exactly
this reason — reuse is only safe for cooperative drops, not hard
remote teardowns, which is the case we most need to recover from.

The other audit wins (high connection priority, exponential backoff
with jitter, StateFlow migration, cached writeType, latched
triggerDrain, configurable logging) are independent and stay.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesarich jamesarich changed the title perf(ble): kable audit — connection priority, peripheral reuse, backoff, StateFlow perf(ble): kable audit — connection priority, backoff, StateFlow Apr 22, 2026
attemptConnection wrapped its disconnect-watcher in
`coroutineScope { onEach{}.launchIn(this); first { Disconnected } }`.
coroutineScope waits for ALL launched children before returning, but a
launched collector on a hot StateFlow never completes naturally. Once
.first returned, the scope hung forever, blocking BleReconnectPolicy
from issuing the next attempt — so a node power-cycle (the canonical
'stable connection then remote drop' path) silently never reconnected.

Replace the coroutineScope wrapper with a plain
`connectionState.filterIsInstance<Disconnected>().first()` and call
onDisconnected() inline. Add a `first { Connected }` gate beforehand
to defeat the StateFlow stale-value race on attempt #2+, where the
previous cycle's Disconnected value would otherwise match immediately.

Add regression test `transport reconnects after a stable connection
is dropped remotely` exercising the full happy-path reconnect cycle,
plus `simulateRemoteDisconnect`/`connectAndAwaitCalls` plumbing on
FakeBleConnection.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesarich jamesarich changed the title perf(ble): kable audit — connection priority, backoff, StateFlow fix(ble): unblock reconnect + kable audit (logging, priority, backoff, StateFlow) Apr 22, 2026
DesktopKoinTest now declares BleLogLevel/BleLogFormat as extraTypes (Koin
Verify introspects the BleLoggingConfig data class constructor even though
the factory builds it from constants), mirroring the Android verifier.

The DFU transport tests' AutoRespondingBleConnection fake now exposes
deviceFlow/connectionState as StateFlow to match the tightened
BleConnection interface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesarich jamesarich merged commit 0b873be into main Apr 22, 2026
11 checks passed
@jamesarich jamesarich deleted the chore/ble-logging-config branch April 22, 2026 23:35
Copilot AI pushed a commit that referenced this pull request Apr 24, 2026
…, StateFlow) (#5222)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: garthvh <1795163+garthvh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants