Skip to content

Fix W5100S socket exhaustion blocking MQTT and additional TCP clients#9770

Merged
thebentern merged 7 commits into
meshtastic:developfrom
PhilipLykov:fix/w5100s-socket-exhaustion
Mar 30, 2026
Merged

Fix W5100S socket exhaustion blocking MQTT and additional TCP clients#9770
thebentern merged 7 commits into
meshtastic:developfrom
PhilipLykov:fix/w5100s-socket-exhaustion

Conversation

@PhilipLykov

@PhilipLykov PhilipLykov commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Fix W5100S socket exhaustion on RAK4631 Ethernet gateways that prevents MQTT connections, DHCP lease renewal, and additional TCP API clients.

The W5100S chip has only 4 hardware sockets. With NTP and Syslog enabled, all 4 sockets were permanently consumed:

Socket Service Lifecycle
0 NTP UDP Permanent (held from boot)
1 Syslog UDP Permanent (held from boot)
2 TCP API Server (LISTEN) Permanent
3 TCP API Client (e.g. MeshMonitor) While connected

This left zero sockets for MQTT, DHCP lease renewal, or NTP updates — causing MQTT to silently fail with no packets sent on the wire.

Changes

  • NTP on-demand (ethClient.cpp): Remove permanent timeClient.begin() at startup. NTPClient::update() auto-initializes when _udpSetup is false. Add timeClient.end() after each query to release the socket immediately.
  • Syslog on-demand (DebugConfiguration.cpp): Remove socket allocation from Syslog::enable(). Open and close the UDP socket in _sendLog() around each message send. Syslog overhead is ~320μs per message for socket open/close, well under 1% CPU at typical log rates.
  • MQTT validation socket leak (MQTT.cpp): Fix isValidConfig() where a successful test connection was never closed — PubSubClient::~PubSubClient() only frees buffers, it does not call disconnect() or _client->stop(). The leaked socket in ESTABLISHED state permanently consumed a W5100S slot.

After fix

Socket Service Lifecycle
0 TCP API Server (LISTEN) Permanent
1 TCP API Client While connected
2 MQTT Client While connected
3 NTP / Syslog / DHCP (shared) On-demand, brief

Test plan

  • RAK4631 Ethernet gateway with syslog, NTP, MQTT, and API client all enabled simultaneously
  • Verify MQTT connects successfully while MeshMonitor is connected
  • Verify syslog messages still appear on the syslog server
  • Verify NTP time sync works after the change
  • Verify DHCP lease renewal works (check over 24+ hours)
  • Verify no regressions on ESP32 WiFi builds (syslog change affects all HAS_NETWORKING platforms)

The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631
Ethernet gateways with syslog and NTP enabled, all 4 sockets were
permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP
API client), leaving none for MQTT, DHCP lease renewal, or additional
TCP connections.

- NTP: Remove permanent timeClient.begin() at startup; NTPClient::update()
  auto-initializes when needed. Add timeClient.end() after each query to
  release the UDP socket immediately.
- Syslog: Remove socket allocation from Syslog::enable(). Open and close
  the UDP socket on-demand in _sendLog() around each message send.
- MQTT: Fix socket leak in isValidConfig() where a successful test
  connection was never closed (PubSubClient destructor does not call
  disconnect). Add explicit pubSub->disconnect() before returning.

Made-with: Cursor
@github-actions github-actions Bot added the bugfix Pull request that fixes bugs label Feb 27, 2026
Comment thread src/DebugConfiguration.cpp
Comment thread src/DebugConfiguration.cpp
Comment thread src/mqtt/MQTT.cpp Outdated
return connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection);
bool result = connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection);
if (client == nullptr) {
pubSub->disconnect();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Socket cleanup is conditional on client == nullptr; when a non-null client is passed, successful validation can still leave the connection open. Since this function is a validator, that side effect can leak scarce sockets on Ethernet-class devices. Consider ensuring cleanup for all validation paths (or explicitly documenting and renaming if persistent connection is intended).

@thebentern

Copy link
Copy Markdown
Contributor

@robekl please stop using LLMs to review random PRs. We have CoPilot for this. Additionally the reviews are also commenting on existing unchanged code. This is your one and only warning.

@PhilipLykov

Copy link
Copy Markdown
Contributor Author

@robekl Thanks for the review.

  1. va_copy() — This is pre-existing code in vlogf() that we did not modify. Valid observation, but out of scope for this PR.

  2. endPacket() return value — Also pre-existing behavior in _sendLog(). Our change only added the begin()/stop() wrapping around it.

  3. Conditional disconnect() — Intentional. When client != nullptr, a caller-provided mock is in use (unit tests assert client.connected_ stays true after isValidConfig()). In production client is always nullptr, so disconnect() always runs.

Resolve conflict in MQTT.cpp: take upstream's rewritten isValidConfig()
which replaced connectPubSub() validation with a lightweight TCP-only
MQTTClient check. The upstream approach inherently fixes the socket
leak our PR addressed (testClient.stop() is always called), and adds
user-visible warnings via clientNotification.

Made-with: Cursor
@PhilipLykov

Copy link
Copy Markdown
Contributor Author

Resolved the merge conflict in MQTT.cpp — upstream rewrote isValidConfig() to use a local MQTTClient testClient with an explicit testClient.stop(), which inherently fixes the socket leak our PR addressed in that function. Took the upstream version.

The other two fixes in this PR are still needed and unaffected by upstream:

  • NTP on-demand (ethClient.cpp): timeClient.end() after each NTP query to release the UDP socket — upstream still holds it permanently from boot.
  • Syslog on-demand (DebugConfiguration.cpp): Open/close the UDP socket in _sendLog() per message — upstream still holds it permanently from enable().

Without these two changes, the W5100S still exhausts all 4 hardware sockets (NTP UDP + Syslog UDP + TCP listener + TCP client), leaving none for MQTT, DHCP renewal, or additional connections.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses W5100S (4-socket) exhaustion on Ethernet gateways by reducing long-lived UDP socket usage so MQTT, DHCP renewal, and additional TCP clients can coexist reliably.

Changes:

  • Ethernet NTP client: stop allocating a UDP socket permanently; release the socket after each NTP query.
  • Syslog client: acquire/release the UDP socket on-demand per log message rather than at enable() time.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/mesh/eth/ethClient.cpp Removes startup NTP socket allocation and releases the NTP UDP socket after each update attempt.
src/DebugConfiguration.cpp Makes syslog UDP socket usage on-demand (begin/stop per message) to avoid consuming a hardware socket indefinitely.

Comment thread src/DebugConfiguration.cpp
@thebentern thebentern merged commit 5319bc7 into meshtastic:develop Mar 30, 2026
82 checks passed
@PhilipLykov PhilipLykov deleted the fix/w5100s-socket-exhaustion branch March 30, 2026 18:52
thebentern added a commit that referenced this pull request Mar 31, 2026
…#9770)

The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631
Ethernet gateways with syslog and NTP enabled, all 4 sockets were
permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP
API client), leaving none for MQTT, DHCP lease renewal, or additional
TCP connections.

- NTP: Remove permanent timeClient.begin() at startup; NTPClient::update()
  auto-initializes when needed. Add timeClient.end() after each query to
  release the UDP socket immediately.
- Syslog: Remove socket allocation from Syslog::enable(). Open and close
  the UDP socket on-demand in _sendLog() around each message send.
- MQTT: Fix socket leak in isValidConfig() where a successful test
  connection was never closed (PubSubClient destructor does not call
  disconnect). Add explicit pubSub->disconnect() before returning.

Made-with: Cursor

Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
skrashevich pushed a commit to skrashevich/meshtastic-firmware that referenced this pull request Apr 4, 2026
…meshtastic#9770)

The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631
Ethernet gateways with syslog and NTP enabled, all 4 sockets were
permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP
API client), leaving none for MQTT, DHCP lease renewal, or additional
TCP connections.

- NTP: Remove permanent timeClient.begin() at startup; NTPClient::update()
  auto-initializes when needed. Add timeClient.end() after each query to
  release the UDP socket immediately.
- Syslog: Remove socket allocation from Syslog::enable(). Open and close
  the UDP socket on-demand in _sendLog() around each message send.
- MQTT: Fix socket leak in isValidConfig() where a successful test
  connection was never closed (PubSubClient destructor does not call
  disconnect). Add explicit pubSub->disconnect() before returning.

Made-with: Cursor

Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Pull request that fixes bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants