Skip to content

feat(network): on-device capture-replay transport + ingestion fuzzing/hardening#5846

Merged
jamesarich merged 3 commits into
mainfrom
claude/upbeat-leakey-252594
Jun 18, 2026
Merged

feat(network): on-device capture-replay transport + ingestion fuzzing/hardening#5846
jamesarich merged 3 commits into
mainfrom
claude/upbeat-leakey-252594

Conversation

@jamesarich

Copy link
Copy Markdown
Collaborator

On-device packet-ingestion work needs a deterministic, radio-free way to drive realistic mesh traffic into the app, and assurance that the untrusted-input path can't be crashed by a malformed packet. This PR adds an on-device capture-replay transport, a seeded fuzz harness over the ingestion boundary, and the per-packet exception isolation that the fuzzing surfaced as missing.

🌟 New Features

  • InterfaceId.REPLAY — on-device capture-replay transport. A debug/test-only "Demo Mode (Replay)" connection entry replays a pre-captured FromRadio frame stream entirely on-device — no network, no paired radio. It honours the two-phase want_config handshake (config → node DB → packet stream) and paces packets at a steady rate, giving a deterministic, self-contained traffic source for benchmarks, populated-UI tests, and manual QA. When the (gitignored) capture asset is absent it falls back to the existing synthetic MockRadioTransport, so the entry always works.

🛠️ Refactoring & Architecture

  • Hardened the replay interface. The asset parser now validates every frame length against the bytes remaining and fails fast with IllegalArgumentException instead of an okio underflow or an oversized allocation; handleSendToRadio tolerates arbitrary inbound bytes (undecodable ToRadio is dropped, not thrown). The on-wire asset format is documented authoritatively in the ReplayRadioTransport KDoc.

🐛 Bug Fixes

  • Isolate per-packet failures in the receive pipeline. The FromRadio decode boundary was already guarded, but the post-decode handler chain (processFromRadio → the per-variant handlers) and the orchestrator's receive loop (receivedData.onEach { handleFromRadio }) were not. A single malformed-but-decodable packet from any peer in radio range whose handler threw would cancel the receivedData collection and silently deafen the radio for the rest of the session — a remote DoS, and likely a crash on Android's default coroutine exception handler. Both layers now contain handler failures with safeCatching (which re-throws CancellationException, preserving structured concurrency), logging and dropping the offending packet.

Testing Performed

  • ReplayFuzz + ReplayFuzzTest (new, core:network commonTest): a seeded, deterministic fuzz harness over the ingestion boundary (random / mutated / structurally-valid-but-hostile inputs), asserting four invariants — corrupt assets fail only with IllegalArgumentException; handleSendToRadio never throws; a bit-flipped frame decodes to a catchable Exception, never an Error; adversarial-but-valid content replays intact. Runs under allTests at zero app-size cost, and every failure reproduces from its integer seed.
  • ReplayRadioTransportTest (extended): added malformed-asset cases (truncated counts, oversized lengths, short/empty sections) alongside the handshake/stream tests — 9 total.
  • MeshMessageProcessorImplTest (extended): added a throwing handler does not propagate out of handleFromRadio, which forces a downstream handler to throw and asserts the call returns normally — it fails without the fix above.
  • Full local gate green on the current main (detekt 2.0.0-alpha.5): spotlessCheck detekt assembleGoogleDebug kmpSmokeCompile, plus :core:network, :core:data, and :core:service allTests.
  • The replay transport itself was verified on-device in earlier work: the two-phase handshake completes, a ~200-node DB lands, and packets decode through the normal ingestion pipeline.

The capture asset (burningmesh.fromradio) is derived from a private mesh capture and is gitignored — it is never committed. Without it, the "Demo Mode (Replay)" entry runs against the synthetic 2-node mock.

jamesarich and others added 3 commits June 18, 2026 08:15
Adds ReplayRadioTransport: a RadioTransport that replays a pre-captured
FromRadio frame stream (exported from the burningmesh-replay tool via
"replay_server.py --export") entirely on-device - no network, no paired
radio. It honours the two-phase want_config handshake (config nonce ->
config/channels; node-info nonce -> node DB, then streams packets at a
steady configurable rate), injecting config_complete_id with the app's
own nonce per phase.

Selected via the new InterfaceId.REPLAY ('r') and a debug-only
"Demo Mode (Replay)" entry beside Demo Mode in Connections. When the
bundled asset (androidApp/src/debug/assets/burningmesh.fromradio,
gitignored - generated from a private capture) is absent, it falls back
to the synthetic MockRadioTransport so the entry always works.

Purpose: a deterministic, realistic (200-node / 1500-packet) traffic
source for Macrobenchmark / Baseline Profile journeys and populated-UI
tests - the "fake transport" follow-up requested in #5735.

Verified on-device: two-phase handshake completes, 202 NodeInfos accepted
in Stage 2, packets decode through the regular ingestion pipeline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ReplayFuzz, a seeded fuzz toolkit for the on-device FromRadio
ingestion path (the same RadioTransportCallback.handleFromRadio entry
every BLE/TCP/serial peer feeds), plus a suite asserting the
untrusted-input boundary's invariants. It runs under allTests, so the
parse/decode boundary is fuzzed on every CI run at zero app-size and zero
runtime cost; any failure reproduces from its integer seed alone.

Invariants (1000 seeds each):
- a corrupt asset fails only with IllegalArgumentException (never OOM,
  hang, or a leaked okio error) - guards the hardened parser
- handleSendToRadio absorbs arbitrary outbound bytes (undecodable ToRadio
  is dropped, not thrown)
- a bit-flipped frame decodes to a catchable Exception, never an Error
  (an OOM/StackOverflow on the decode path would be a remote DoS)
- adversarial-but-valid content replays through the transport intact

Scope: covers the self-contained layers with clean oracles (asset parser,
handshake input, protobuf decode). The adversarialFromRadio corpus is
built to feed a future MeshMessageProcessor integration fuzzer - the
post-decode handlers are not yet exception-isolated - while end-to-end
framing/UI fuzzing belongs in the capture tool's live --fuzz mode.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The FromRadio decode boundary was guarded (runCatching + LogRecord
fallback), but the post-decode handler chain (processFromRadio ->
FromRadioPacketHandlerImpl.handleX) and the orchestrator's receive loop
(receivedData.onEach { handleFromRadio }) were not. So a single
malformed-but-decodable packet from any peer in radio range that made a
handler throw would cancel the receivedData collection and silently
deafen the radio for the rest of the session - a remote DoS, and likely a
crash on Android's default coroutine exception handler.

Contain handler failures at both layers with safeCatching (catches
Exception, re-throws CancellationException so structured concurrency is
preserved), logging and dropping the offending packet:
- MeshMessageProcessorImpl.processFromRadio - makes handleFromRadio total
  (the root-cause fix).
- MeshServiceOrchestrator receive loop - backstops the single lifeline
  that feeds all inbound traffic.

Adds a regression test (a throwing handler does not propagate out of
handleFromRadio) that forces a handler to throw and asserts the call
returns normally; it fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the enhancement New feature or request label Jun 18, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🖼️ Preview staleness check — advisory

This PR modifies UI composables but does not update any *Previews.kt files.

Previews power screenshot tests and in-app docs screenshots. Keeping them current ensures visual regression coverage stays accurate.

Changed UI files:

feature/connections/src/commonMain/kotlin/org/meshtastic/feature/connections/ui/ConnectionsScreen.kt
feature/connections/src/commonMain/kotlin/org/meshtastic/feature/connections/ui/components/DeviceListItem.kt

What to check:

Pattern Preview file convention
feature/{name}/…/ui/ or component/ feature/{name}/…/*Previews.kt
core/ui/…/ core/ui/…/ (previews colocated)

Adding previews checklist:

  1. Create or update a *Previews.kt file in the same module with @PreviewLightDark
  2. Add @Suppress("PreviewPublic") if the preview is consumed by screenshot-tests
  3. Add corresponding @PreviewTest function in screenshot-tests/src/screenshotTest/
  4. Run ./gradlew :screenshot-tests:updateDebugScreenshotTest to generate reference images

If this PR does not require preview updates (e.g., logic-only change, non-visual refactor), add the skip-preview-check label to dismiss.

@github-actions

Copy link
Copy Markdown
Contributor

📄 Docs staleness check — advisory

This PR modifies user-facing UI source files but does not update any page under docs/en/user/ or docs/en/developer/.

⚠️ Doc changes propagate to 3 consumers: in-app docs browser, Jekyll site (GitHub Pages), and meshtastic.org (Docusaurus sync). Updating a page in docs/en/ automatically flows to all three.

Changed source files:

feature/connections/src/commonMain/kotlin/org/meshtastic/feature/connections/ui/ConnectionsScreen.kt
feature/connections/src/commonMain/kotlin/org/meshtastic/feature/connections/ui/components/DeviceListItem.kt

What to check:

Changed area Likely doc page
feature/messaging/ docs/en/user/messages-and-channels.md
feature/node/ docs/en/user/nodes.md or docs/en/user/node-metrics.md
feature/map/ docs/en/user/map-and-waypoints.md
feature/connections/ docs/en/user/connections.md
feature/settings/ docs/en/user/settings-radio-user.md or docs/en/user/settings-module-admin.md
feature/firmware/ docs/en/user/firmware.md
feature/intro/ docs/en/user/onboarding.md
feature/discovery/ docs/en/user/discovery.md
feature/docs/ Internal docs infrastructure
core/ui/ docs/en/developer/codebase.md or component-specific user pages

New page checklist (if adding a new doc page):

  1. Create the .md file in docs/en/user/ or docs/en/developer/ with last_updated frontmatter
  2. Register in DocBundleLoader.kt with string resources (in-app browser)
  3. Jekyll and Docusaurus sync pick up new pages automatically — no config change needed

If this PR does not require a doc update (e.g., internal refactor, bug fix, test change), add the skip-docs-check label to dismiss this check.

Cross-platform note: This check is advisory while doc coverage matures. Both Android and Apple repos use the same skip-docs-check label and advisory severity. See meshtastic/design standards for shared conventions.

@jamesarich jamesarich marked this pull request as ready for review June 18, 2026 14:04
@jamesarich jamesarich added this pull request to the merge queue Jun 18, 2026
Merged via the queue into main with commit 1125172 Jun 18, 2026
18 checks passed
@jamesarich jamesarich deleted the claude/upbeat-leakey-252594 branch June 18, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant