fix(feishu): await HTTP server close before clearing state references#48588
fix(feishu): await HTTP server close before clearing state references#48588alex-xuweilong wants to merge 3 commits intoopenclaw:mainfrom
Conversation
stopFeishuMonitorState() calls server.close() then immediately deletes the server reference from the httpServers Map. Since server.close() is asynchronous, the port may still be bound when the reference is removed — causing EADDRINUSE on restart and leaking server objects in memory. Await server.close() with a 5-second timeout before clearing Map entries. Also call closeAllConnections() (Node 18.2+) to force-close keep-alive connections. Fixes openclaw#48183
Greptile SummaryThis PR fixes a real and well-diagnosed race condition: However,
Confidence Score: 2/5
|
| const cleanup = async () => { | ||
| (server.closeAllConnections as (() => void) | undefined)?.(); | ||
| await new Promise<void>((cb) => server.close(() => cb())); | ||
| httpServers.delete(accountId); | ||
| botOpenIds.delete(accountId); | ||
| botNames.delete(accountId); | ||
| }; |
There was a problem hiding this comment.
Cleanup can hang indefinitely — missing timeout
closeHttpServer() in monitor.state.ts protects against a stuck server with a 5-second Promise.race timeout, but the cleanup() function here skips that guard. If closeAllConnections is unavailable (Node.js < 18.2) and there are persistent keep-alive connections, server.close() will never invoke its callback — it only stops accepting new connections, not closing existing ones. That means cleanup() hangs forever, void cleanup().then(() => resolve()) never calls resolve(), and monitorWebhook()'s outer Promise never settles.
The simplest fix is to export closeHttpServer from monitor.state.ts and reuse it here, so both paths share the same timeout behaviour:
// In monitor.state.ts — add `export` keyword
export function closeHttpServer(server: http.Server): Promise<void> { ... }
| const cleanup = async () => { | |
| (server.closeAllConnections as (() => void) | undefined)?.(); | |
| await new Promise<void>((cb) => server.close(() => cb())); | |
| httpServers.delete(accountId); | |
| botOpenIds.delete(accountId); | |
| botNames.delete(accountId); | |
| }; | |
| const cleanup = async () => { | |
| await closeHttpServer(server); | |
| httpServers.delete(accountId); | |
| botOpenIds.delete(accountId); | |
| botNames.delete(accountId); | |
| }; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 233-239
Comment:
**Cleanup can hang indefinitely — missing timeout**
`closeHttpServer()` in `monitor.state.ts` protects against a stuck server with a 5-second `Promise.race` timeout, but the `cleanup()` function here skips that guard. If `closeAllConnections` is unavailable (Node.js < 18.2) and there are persistent keep-alive connections, `server.close()` will never invoke its callback — it only stops *accepting* new connections, not closing existing ones. That means `cleanup()` hangs forever, `void cleanup().then(() => resolve())` never calls `resolve()`, and `monitorWebhook()`'s outer Promise never settles.
The simplest fix is to export `closeHttpServer` from `monitor.state.ts` and reuse it here, so both paths share the same timeout behaviour:
```
// In monitor.state.ts — add `export` keyword
export function closeHttpServer(server: http.Server): Promise<void> { ... }
```
```suggestion
const cleanup = async () => {
await closeHttpServer(server);
httpServers.delete(accountId);
botOpenIds.delete(accountId);
botNames.delete(accountId);
};
```
How can I resolve this? If you propose a fix, please make it concise.| void cleanup().then(() => resolve()); | ||
| }; | ||
|
|
||
| if (abortSignal?.aborted) { | ||
| cleanup(); | ||
| resolve(); | ||
| void cleanup().then(() => resolve()); | ||
| return; |
There was a problem hiding this comment.
Cleanup rejections silently swallowed
Using void cleanup().then(() => resolve()) discards any rejection from cleanup(). If cleanup ever rejects (e.g., due to a future code change that can throw), the outer monitorWebhook() Promise will hang forever — resolve() is never called and reject() is never invoked.
Passing reject as the second argument to .then() ensures the outer Promise settles even on error:
| void cleanup().then(() => resolve()); | |
| }; | |
| if (abortSignal?.aborted) { | |
| cleanup(); | |
| resolve(); | |
| void cleanup().then(() => resolve()); | |
| return; | |
| void cleanup().then(() => resolve(), reject); |
And similarly for the already-aborted guard a few lines below:
| void cleanup().then(() => resolve()); | |
| }; | |
| if (abortSignal?.aborted) { | |
| cleanup(); | |
| resolve(); | |
| void cleanup().then(() => resolve()); | |
| return; | |
| void cleanup().then(() => resolve(), reject); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 243-248
Comment:
**Cleanup rejections silently swallowed**
Using `void cleanup().then(() => resolve())` discards any rejection from `cleanup()`. If cleanup ever rejects (e.g., due to a future code change that can throw), the outer `monitorWebhook()` Promise will hang forever — `resolve()` is never called and `reject()` is never invoked.
Passing `reject` as the second argument to `.then()` ensures the outer Promise settles even on error:
```suggestion
void cleanup().then(() => resolve(), reject);
```
And similarly for the already-aborted guard a few lines below:
```suggestion
void cleanup().then(() => resolve(), reject);
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8fc7e52923
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| new Promise<void>((resolve) => { | ||
| setTimeout(() => { | ||
| console.warn("feishu: HTTP server close timed out, continuing cleanup"); | ||
| resolve(); | ||
| }, SERVER_CLOSE_TIMEOUT_MS); |
There was a problem hiding this comment.
Cancel server-close fallback timer after successful close
The new closeHttpServer timeout branch is never canceled when server.close() finishes first, so every normal shutdown still leaves a live 5s timer that keeps the event loop open and eventually prints a false timeout warning. In practice, stopping Feishu monitoring now adds an avoidable shutdown delay/noise on every close path. Please clear the timeout when the close callback runs (or otherwise make the fallback timer non-blocking).
Useful? React with 👍 / 👎.
|
Thanks for the thorough reviews! All three issues addressed:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 284336a893
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| resolve(); | ||
| }, SERVER_CLOSE_TIMEOUT_MS); | ||
|
|
||
| (server.closeAllConnections as (() => void) | undefined)?.(); |
There was a problem hiding this comment.
Call
server.close() before destroying all connections
closeHttpServer currently invokes closeAllConnections() before server.close(), which leaves a race window where new sockets can still be accepted between those two calls. Under shutdown traffic, that can keep server.close() from completing until the 5s timeout path resolves, reintroducing the same premature-cleanup behavior this patch is trying to eliminate (state is cleared while the port may still be bound).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — reordered to call server.close() first (stop accepting new connections), then closeAllConnections() (destroy existing sockets). This eliminates the race window where new sockets could be accepted during shutdown.
|
Codex review: needs changes before merge. Summary Reproducibility: yes. at source level: current main calls Next step before merge Security Review findings
Review detailsBest possible solution: Land this PR or a maintainer-owned equivalent after adding the required changelog entry and running merge-preview validation; keep the broader ResourceManager proposal in #48793 separate. Do we have a high-confidence way to reproduce the issue? Yes, at source level: current main calls Is this the best way to solve the issue? Yes. A Feishu-owned awaitable HTTP close helper shared by state cleanup and webhook abort cleanup is the narrowest maintainable fix; the remaining repair is the required changelog entry. Full review comments:
Overall correctness: patch is correct Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 89db1e5440f5. |
Summary
Prevents port leaks (EADDRINUSE) and memory leaks caused by premature HTTP server reference cleanup during Feishu plugin shutdown/restart.
Root Cause
stopFeishuMonitorState()callsserver.close()then immediately deletes the server from thehttpServersMap. Sinceserver.close()is asynchronous, the port may still be bound when the reference is removed — causing EADDRINUSE on restart and leaking server objects. The same pattern exists inmonitor.transport.ts's cleanup function.Fix
closeHttpServer()helper: callscloseAllConnections()(Node 18.2+) then awaitsserver.close()callback, with a 5-second timeout fallbackstopFeishuMonitorState()to async: await server close before deleting Map entries (single-account and full-cleanup paths)monitor.transport.tscleanup: awaitserver.close()before clearing stateChanges
extensions/feishu/src/monitor.state.ts: 22 lines added, 5 removedextensions/feishu/src/monitor.transport.ts: 5 lines added, 6 removedFixes #48183