Skip to content

Conversation

@willmmiles
Copy link

The closed slot system was a solution to manage the possiblity that a pcb might have been closed by LwIP while operations were still pending on the async thread. When doing load testing on an ESP32-S2 last year, I discovered a possible race condition with this code:

  • the async thread attempts a LwIP call on an AsyncClient such as _tcp_recved, copying the tcp_pcb* and closed_slot index in to a tcpip_api_call_data structure, then blocking in tcpip_api_call waiting for the LwIP thread.
  • Next, the LwIP thread processes an incoming packet for that connection, but cannot allocate a pbuf (or has some other issue); it calls tcp_error, unbinding the closed_slot, queuing the error event for the AsyncClient, and disposing of the tcp_pcb*. This frees some memory.
  • The LwIP thread then processes a new connection, calls tcp_accept, ultimately re-assigning the closed_slot to a new tcp_pcb*.
  • Finally, with no more inbound packets to process, the LwIP thread services the tcpip_api_call, which finds that the closed_slot is valid -- and attempts to write through the discarded tcp_pcb*.
  • Kaboom!

It turns out that the underlying problem that closed_slots was invented to solve admits a simpler solution. Since only the LwIP thread can invalidate a tcp_pcb* (either by notifiying us it's doing so via tcp_error, or by client calls to tcp_close or tcp_abort), we can treat AsyncClient::_pcb as "owned" under the LwIP mutex. Instead of the tcp_api_call_t storing the handle directly, we can instead pass a pointer to AsyncClient::_pcb; and arrange that any invalidation operation sets it to null while the mutex is still held. This permits us to safely check for validity inside the _tcp_x_api calls, as they also hold the mutex. Essentially, instead of managing separate life cycle tracking storage, we leverage the memory in AsyncClient itself to keep a safe view whenever any operation takes place in the LwIP context.

The closed slot system was a solution to manage the possiblity that a
pcb might have been closed by LwIP while operations were still pending
on the async thread. Replace this system with double indirection over
the pcb pointers, now owned solely by LwIP. This closes a possible hole
where a closed_slot may have been re-used while still held by an
AsyncClient when under heavy load.

This comment was marked as outdated.

@me-no-dev
Copy link
Member

LGTM just remove the asserts :)

@mathieucarbou
Copy link
Member

@me-no-dev : could you please push a commit in another pr to fix the esp-idf CI ? thanks :-)

@me-no-dev
Copy link
Member

@mathieucarbou CI will be fixed on next Arduino release. IDF made breaking changes mid-release

@mathieucarbou mathieucarbou requested a review from Copilot June 19, 2025 16:51
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the closed slot management mechanism and instead uses double indirection on the TCP PCB pointer to safely track its state under the LwIP mutex. The key changes include:

  • Eliminating the closed slot variable and its allocation/free logic from AsyncTCP.h.
  • Refactoring various tcp* API call functions in AsyncTCP.cpp to use a pointer-to-pointer for the PCB.
  • Adjusting higher-level AsyncClient functions (e.g., connect, close, abort) to accommodate the new design.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/AsyncTCP.h Removed closed slot tracking members and related allocation logic.
src/AsyncTCP.cpp Updated TCP API calls to use double indirection for pcb management and removed obsolete closed slot usage.
Comments suppressed due to low confidence (2)

src/AsyncTCP.cpp:671

  • Consider using the AsyncClient's _pcb pointer (by passing its address) in _tcp_connect for consistency with other API calls; using a local copy might lead to divergence in connection state updates.
static esp_err_t _tcp_connect(tcp_pcb *pcb, ip_addr_t *addr, uint16_t port, tcp_connected_fn cb) {

src/AsyncTCP.cpp:830

  • For consistency with other functions that require the address of the pcb, consider passing the address of the client’s _pcb (i.e., &_pcb) to _tcp_connect instead of using a local variable copy.
  esp_err_t err = _tcp_connect(pcb, &addr, port, (tcp_connected_fn)&_tcp_connected);

@mathieucarbou mathieucarbou merged commit 7afc0cb into ESP32Async:main Jun 19, 2025
21 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants