Replace closed_slots with double indirection #73
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The closed slot system was a solution to manage the possiblity that a pcb might have been closed by LwIP while operations were still pending on the async thread. When doing load testing on an ESP32-S2 last year, I discovered a possible race condition with this code:
_tcp_recved, copying thetcp_pcb*andclosed_slotindex in to atcpip_api_call_datastructure, then blocking intcpip_api_callwaiting for the LwIP thread.pbuf(or has some other issue); it callstcp_error, unbinding theclosed_slot, queuing the error event for the AsyncClient, and disposing of thetcp_pcb*. This frees some memory.tcp_accept, ultimately re-assigning theclosed_slotto a newtcp_pcb*.tcpip_api_call, which finds that theclosed_slotis valid -- and attempts to write through the discardedtcp_pcb*.It turns out that the underlying problem that
closed_slotswas invented to solve admits a simpler solution. Since only the LwIP thread can invalidate atcp_pcb*(either by notifiying us it's doing so viatcp_error, or by client calls totcp_closeortcp_abort), we can treatAsyncClient::_pcbas "owned" under the LwIP mutex. Instead of thetcp_api_call_tstoring the handle directly, we can instead pass a pointer toAsyncClient::_pcb; and arrange that any invalidation operation sets it to null while the mutex is still held. This permits us to safely check for validity inside the_tcp_x_apicalls, as they also hold the mutex. Essentially, instead of managing separate life cycle tracking storage, we leverage the memory inAsyncClientitself to keep a safe view whenever any operation takes place in the LwIP context.