Fix W5100S socket exhaustion blocking MQTT and additional TCP clients#9770
Conversation
The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631 Ethernet gateways with syslog and NTP enabled, all 4 sockets were permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP API client), leaving none for MQTT, DHCP lease renewal, or additional TCP connections. - NTP: Remove permanent timeClient.begin() at startup; NTPClient::update() auto-initializes when needed. Add timeClient.end() after each query to release the UDP socket immediately. - Syslog: Remove socket allocation from Syslog::enable(). Open and close the UDP socket on-demand in _sendLog() around each message send. - MQTT: Fix socket leak in isValidConfig() where a successful test connection was never closed (PubSubClient destructor does not call disconnect). Add explicit pubSub->disconnect() before returning. Made-with: Cursor
| return connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection); | ||
| bool result = connectPubSub(parsed, *pubSub, (client != nullptr) ? *client : *clientConnection); | ||
| if (client == nullptr) { | ||
| pubSub->disconnect(); |
There was a problem hiding this comment.
Socket cleanup is conditional on client == nullptr; when a non-null client is passed, successful validation can still leave the connection open. Since this function is a validator, that side effect can leak scarce sockets on Ethernet-class devices. Consider ensuring cleanup for all validation paths (or explicitly documenting and renaming if persistent connection is intended).
|
@robekl please stop using LLMs to review random PRs. We have CoPilot for this. Additionally the reviews are also commenting on existing unchanged code. This is your one and only warning. |
|
@robekl Thanks for the review.
|
Resolve conflict in MQTT.cpp: take upstream's rewritten isValidConfig() which replaced connectPubSub() validation with a lightweight TCP-only MQTTClient check. The upstream approach inherently fixes the socket leak our PR addressed (testClient.stop() is always called), and adds user-visible warnings via clientNotification. Made-with: Cursor
|
Resolved the merge conflict in The other two fixes in this PR are still needed and unaffected by upstream:
Without these two changes, the W5100S still exhausts all 4 hardware sockets (NTP UDP + Syslog UDP + TCP listener + TCP client), leaving none for MQTT, DHCP renewal, or additional connections. |
There was a problem hiding this comment.
Pull request overview
Addresses W5100S (4-socket) exhaustion on Ethernet gateways by reducing long-lived UDP socket usage so MQTT, DHCP renewal, and additional TCP clients can coexist reliably.
Changes:
- Ethernet NTP client: stop allocating a UDP socket permanently; release the socket after each NTP query.
- Syslog client: acquire/release the UDP socket on-demand per log message rather than at
enable()time.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/mesh/eth/ethClient.cpp | Removes startup NTP socket allocation and releases the NTP UDP socket after each update attempt. |
| src/DebugConfiguration.cpp | Makes syslog UDP socket usage on-demand (begin/stop per message) to avoid consuming a hardware socket indefinitely. |
…#9770) The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631 Ethernet gateways with syslog and NTP enabled, all 4 sockets were permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP API client), leaving none for MQTT, DHCP lease renewal, or additional TCP connections. - NTP: Remove permanent timeClient.begin() at startup; NTPClient::update() auto-initializes when needed. Add timeClient.end() after each query to release the UDP socket immediately. - Syslog: Remove socket allocation from Syslog::enable(). Open and close the UDP socket on-demand in _sendLog() around each message send. - MQTT: Fix socket leak in isValidConfig() where a successful test connection was never closed (PubSubClient destructor does not call disconnect). Add explicit pubSub->disconnect() before returning. Made-with: Cursor Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
…meshtastic#9770) The W5100S Ethernet chip has only 4 hardware sockets. On RAK4631 Ethernet gateways with syslog and NTP enabled, all 4 sockets were permanently consumed (NTP UDP + Syslog UDP + TCP API listener + TCP API client), leaving none for MQTT, DHCP lease renewal, or additional TCP connections. - NTP: Remove permanent timeClient.begin() at startup; NTPClient::update() auto-initializes when needed. Add timeClient.end() after each query to release the UDP socket immediately. - Syslog: Remove socket allocation from Syslog::enable(). Open and close the UDP socket on-demand in _sendLog() around each message send. - MQTT: Fix socket leak in isValidConfig() where a successful test connection was never closed (PubSubClient destructor does not call disconnect). Add explicit pubSub->disconnect() before returning. Made-with: Cursor Co-authored-by: Ben Meadors <benmmeadors@gmail.com>
Summary
Fix W5100S socket exhaustion on RAK4631 Ethernet gateways that prevents MQTT connections, DHCP lease renewal, and additional TCP API clients.
The W5100S chip has only 4 hardware sockets. With NTP and Syslog enabled, all 4 sockets were permanently consumed:
This left zero sockets for MQTT, DHCP lease renewal, or NTP updates — causing MQTT to silently fail with no packets sent on the wire.
Changes
ethClient.cpp): Remove permanenttimeClient.begin()at startup.NTPClient::update()auto-initializes when_udpSetupis false. AddtimeClient.end()after each query to release the socket immediately.DebugConfiguration.cpp): Remove socket allocation fromSyslog::enable(). Open and close the UDP socket in_sendLog()around each message send. Syslog overhead is ~320μs per message for socket open/close, well under 1% CPU at typical log rates.MQTT.cpp): FixisValidConfig()where a successful test connection was never closed —PubSubClient::~PubSubClient()only frees buffers, it does not calldisconnect()or_client->stop(). The leaked socket in ESTABLISHED state permanently consumed a W5100S slot.After fix
Test plan