Skip to content

fix(phone-api): skip manifest scan for node-info-only config requests#10754

Merged
thebentern merged 5 commits into
meshtastic:masterfrom
jeremiah-k:bugfix/phone-api-skip-manifest-scan
Jun 23, 2026
Merged

fix(phone-api): skip manifest scan for node-info-only config requests#10754
thebentern merged 5 commits into
meshtastic:masterfrom
jeremiah-k:bugfix/phone-api-skip-manifest-scan

Conversation

@jeremiah-k

@jeremiah-k jeremiah-k commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Overview

This change fixes a firmware crash I hit while testing the Android WiFi/TCP handshake recovery work that landed in meshtastic/Meshtastic-Android#5856.

During Android Stage 2, the app sends a node-info-only config request. On my tbeam running v2.7.25.104df5f, the firmware logged that the client only wanted node info, then crashed/rebooted before completing the node-info response.

The main fix is to skip filesystem manifest loading entirely for the SPECIAL_NONCE_ONLY_NODES path. That path does not use the file manifest, so avoiding the SPI lock and filesystem walk removes unnecessary memory pressure during node-info-only config starts.

This also hardens filesystem manifest enumeration by making it bounded and less allocation-heavy.

Key Changes

PhoneAPI

  • Skips filesystem manifest loading for SPECIAL_NONCE_ONLY_NODES.
  • Releases any existing manifest data when starting the node-info-only path.
  • Keeps the node-info-only flow intact: send node info, then send config complete.
  • Uses explicit manifest scan limits for normal config requests:
    • depth: 3
    • max entries: 64
  • Logs when manifest enumeration was limited.

Filesystem Enumeration

  • Extends getFiles() with optional maxCount and wasLimited parameters.
  • Defaults getFiles() to at most 64 entries.
  • Adds bounded recursive traversal so enumeration stops once enough entries are collected.
  • Uses one accumulator vector instead of building temporary vectors for recursive calls.
  • Uses strlcpy instead of strcpy for file names.
  • Avoids temporary String allocation for dot-suffix filtering.
  • Compile-gates the std::bad_alloc / std::length_error reserve fallback so no-exception targets still build.

Validation

Firmware built from 86f062657c95a988492d35979e972ed8d3a4a350 has been running well for about two days across multiple tbeam, rak4631, and T1000-E devices.

With that firmware, Android can connect without triggering the crash/reboot loop I was seeing before.

The latest branch includes a few additional review-suggested filesystem traversal hardening changes, which I am loading up for more testing now.

This was tested together with the latest Android-side WiFi/TCP recovery changes in meshtastic/Meshtastic-Android#5856.

Review Notes

This patch deserves careful maintainer review. I have done the best I can to review the C/C++ side with AI tooling and device testing, but I do not have much experience with C yet.

The main behavior change to call out is that getFiles() now limits results to 64 entries by default. Existing callers still compile because the new parameters have defaults, but callers that need a larger or complete listing should pass a larger maxCount and can use wasLimited to detect truncation.

Breaking Changes / Migration

getFiles() now limits results by default to at most 64 entries.

Existing callers continue to compile due to default arguments, but they may now receive a truncated listing. Callers that require more entries should pass a larger maxCount and optionally check wasLimited.

Attestations

  • I have tested that my proposed changes behave as described.
  • I have tested that my proposed changes do not cause any obvious regressions on the following devices:
    • Heltec (Lora32) V3
    • LilyGo T-Deck
    • LilyGo T-Beam
    • RAK WisBlock 4631
    • Seeed Studio T-1000E tracker card
    • Other

The recursive filesystem traversal in getFiles() could grow its result
vector without limit, contributing to heap pressure on memory-constrained
devices during client config starts. The implementation also relied on
unbounded strcpy and per-entry String() allocations for dot-suffix
filtering, and leaked the directory handle when the opened path was not
a directory.

Rework the traversal to address all of those at once:

- Add optional maxCount (default 64) and wasLimited out-param. The walk
  stops once maxCount entries have been collected and wasLimited is set,
  so callers can distinguish a complete listing from a truncated one.
  Default arguments preserve backward compatibility for existing callers.

- Extract the recursive walk into a collectFiles() helper in an
  anonymous namespace, accumulating into a single caller-owned vector
  instead of building and inserting temporary vectors at every recursion
  level.

- Replace strcpy with strlcpy so file_name writes are bounded by the
  destination buffer.

- Replace String(name).endsWith(".") with a stack-based
  pathEndsWithDot() helper to avoid a heap allocation per entry.

- Close the directory handle when the opened path is not a directory,
  and close the file in both branches of recursion instead of only the
  leaf branch (the previous code leaked the handle when depth was
  exhausted).

- Compile-gate the std::bad_alloc reserve probe-down behind
  __cpp_exceptions / __EXCEPTIONS so the fallback only compiles where
  bad_alloc is actually catchable. No-exception targets (nRF52, RP2040,
  STM32, ESP32-C6) skip the probe and rely on the bounded maxCount
  alone, which is safe because the default cap is small enough to fit
  comfortably in available RAM on every supported platform.
A client can request a node-info-only config response by sending
want_config_id with the SPECIAL_NONCE_ONLY_NODES nonce. The manifest
walk in handleStartConfig() ran unconditionally even though
STATE_SEND_FILEMANIFEST short-circuits past the manifest on that path,
so the SPI lock and recursive filesystem traversal were wasted work.
On memory-constrained devices this extra heap pressure was enough to
trigger allocation aborts and spontaneous reboots before the node-info
response could complete.

Gate the manifest call behind config_nonce != SPECIAL_NONCE_ONLY_NODES
so the node-info-only path no longer touches the filesystem at all, and
release any stale manifest storage left over from a previous run.

For normal full-config requests, also tighten the scan limits from
depth 10 / unbounded to depth 3 / max 64 entries via the new getFiles()
parameters. A 10-level recursive listing was never useful for a client
file picker and amplified heap pressure for no benefit. Log a warning
when the listing was truncated so operators can distinguish a bounded
result from a genuinely empty filesystem.

A few small cleanups fall out of the above:

- Replace raw spiLock->lock()/unlock() with a LockGuard so the lock is
  released even if getFiles() throws on exception-enabled targets.

- Use a releaseFilesManifest() helper (swap-with-empty idiom) in
  close() and the STATE_SEND_FILEMANIFEST drain path instead of the
  previous clear()+shrink_to_fit() pair. Equivalent behavior, but the
  swap idiom is the canonical vector-freeing pattern and centralizing
  it avoids divergence between call sites.

- Use %zu for the size_t log arguments.
Close each child directory entry before recursive descent so bounded manifest scans do not stack file handles on constrained filesystems.

Treat null or overlong paths as truncated listings, handle reserve length failures like allocation failures, and keep manifest constants aligned with repository naming.
@github-actions

github-actions Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

⚡ Try this PR in the Web Flasher

Flash this PR in the Web Flasher

firmware commit boards expires

Warning

This is an automated, unreviewed CI test build. Back up your device configuration
before flashing, and only flash devices you are able to recover.

Supported boards built by this PR (22)
Device Board Platform
Crowpanel Adv 3.5 TFT elecrow-adv-35-tft esp32-s3
Heltec HT62 heltec-ht62-esp32c3-sx1262 esp32-c3
Heltec Mesh Node 096 heltec-mesh-node-t096 nrf52840
Heltec Mesh Node T1 heltec-mesh-node-t1 nrf52840
Heltec Mesh Node T114 heltec-mesh-node-t114 nrf52840
Heltec V3 heltec-v3 esp32-s3
Raspberry Pi Pico pico rp2040
Raspberry Pi Pico W picow rp2040
RAK WisBlock 11200 rak11200 esp32
RAK WisBlock 11310 rak11310 rp2040
RAK3312 rak3312 esp32-s3
RAK WisBlock 4631 rak4631 nrf52840
Seeed Xiao NRF52840 Kit seeed_xiao_nrf52840_kit nrf52840
Seeed SenseCAP Indicator seeed-sensecap-indicator-tft esp32-s3
Seeed Xiao ESP32-S3 seeed-xiao-s3 esp32-s3
Station G2 station-g2 esp32-s3
Station G3 station-g3 esp32-s3
LILYGO T-Deck t-deck-tft esp32-s3
LILYGO T-Echo t-echo nrf52840
LILYGO T-Echo Plus t-echo-plus nrf52840
LILYGO T-Impulse Plus t-impulse-plus nrf52840
Seeed SenseCAP T1000-E tracker-t1000-e nrf52840

Build artifacts expire on 2026-07-21. Updated for a039730.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a reported firmware crash during Android WiFi/TCP handshake recovery by avoiding filesystem manifest enumeration on the node-info-only config path, and by making filesystem manifest enumeration bounded and less allocation-heavy.

Changes:

  • Skip filesystem manifest loading for SPECIAL_NONCE_ONLY_NODES in PhoneAPI and aggressively release any existing manifest buffer on that path.
  • Add bounded manifest scanning controls (depth + max entries) and log when the scan was limited.
  • Extend getFiles() to support bounded recursive traversal with fewer allocations and safer path copying.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/mesh/PhoneAPI.cpp Skips manifest scan for node-info-only config requests; bounds normal scans and releases manifest storage more explicitly.
src/FSCommon.h Extends getFiles() API to accept maxCount and an optional wasLimited out-param (with a default max of 64).
src/FSCommon.cpp Reworks getFiles() to use a single accumulator vector, bounded recursion, safer copies, and optional reserve fallback under exceptions.

Comment thread src/FSCommon.cpp Outdated
Comment thread src/FSCommon.cpp Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread src/FSCommon.cpp
@thebentern

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

…ip-manifest-scan

# Conflicts:
#	src/FSCommon.cpp

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@thebentern thebentern merged commit 54e0d8d into meshtastic:master Jun 23, 2026
47 of 72 checks passed
@jeremiah-k jeremiah-k deleted the bugfix/phone-api-skip-manifest-scan branch June 24, 2026 12:40
thebentern added a commit that referenced this pull request Jun 24, 2026
… getFiles (#10778)

Forward-port of #10754 and #10757 from master (2.7) into develop, so the
develop->master 2.8 promotion (#10777) doesn't drop them.

#10754: PhoneAPI no longer walks the filesystem to build the file manifest on
node-info-only config requests (SPECIAL_NONCE_ONLY_NODES), which never consume
it. getFiles() is now bounded (default 64 entries, depth 3) via collectFiles(),
takes an optional wasLimited out-param, and reserves capacity with a bad_alloc/
length_error fallback. The manifest vector is freed via swap (releaseFilesManifest).

#10757: getFiles()/collectFiles() now guard against empty file names returned by
the Adafruit LittleFS nRF52 glue (issue 4395).

Ported by hand rather than cherry-picked: master had reflowed FSCommon.cpp to a
different brace style (every line conflicted), #10754 already subsumes #10757,
and develop carries a MESHTASTIC_EXCLUDE_FILES_MANIFEST path (nRF54L15) that
master lacks. The exclude path is preserved and now also short-circuits + frees
the manifest. Verified: native Docker suite 448/448, clang-format clean.
raghumad pushed a commit to raghumad/mezulla-firmware that referenced this pull request Jun 25, 2026
… getFiles (meshtastic#10778)

Forward-port of meshtastic#10754 and meshtastic#10757 from master (2.7) into develop, so the
develop->master 2.8 promotion (meshtastic#10777) doesn't drop them.

meshtastic#10754: PhoneAPI no longer walks the filesystem to build the file manifest on
node-info-only config requests (SPECIAL_NONCE_ONLY_NODES), which never consume
it. getFiles() is now bounded (default 64 entries, depth 3) via collectFiles(),
takes an optional wasLimited out-param, and reserves capacity with a bad_alloc/
length_error fallback. The manifest vector is freed via swap (releaseFilesManifest).

meshtastic#10757: getFiles()/collectFiles() now guard against empty file names returned by
the Adafruit LittleFS nRF52 glue (issue 4395).

Ported by hand rather than cherry-picked: master had reflowed FSCommon.cpp to a
different brace style (every line conflicted), meshtastic#10754 already subsumes meshtastic#10757,
and develop carries a MESHTASTIC_EXCLUDE_FILES_MANIFEST path (nRF54L15) that
master lacks. The exclude path is preserved and now also short-circuits + frees
the manifest. Verified: native Docker suite 448/448, clang-format clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants