Skip to content

fix(feishu): await HTTP server close before clearing state references#48588

Open
alex-xuweilong wants to merge 3 commits intoopenclaw:mainfrom
alex-xuweilong:fix/feishu-monitor-state-cleanup
Open

fix(feishu): await HTTP server close before clearing state references#48588
alex-xuweilong wants to merge 3 commits intoopenclaw:mainfrom
alex-xuweilong:fix/feishu-monitor-state-cleanup

Conversation

@alex-xuweilong
Copy link
Copy Markdown

Summary

Prevents port leaks (EADDRINUSE) and memory leaks caused by premature HTTP server reference cleanup during Feishu plugin shutdown/restart.

Root Cause

stopFeishuMonitorState() calls server.close() then immediately deletes the server from the httpServers Map. Since server.close() is asynchronous, the port may still be bound when the reference is removed — causing EADDRINUSE on restart and leaking server objects. The same pattern exists in monitor.transport.ts's cleanup function.

Fix

  • Add closeHttpServer() helper: calls closeAllConnections() (Node 18.2+) then awaits server.close() callback, with a 5-second timeout fallback
  • Change stopFeishuMonitorState() to async: await server close before deleting Map entries (single-account and full-cleanup paths)
  • Fix monitor.transport.ts cleanup: await server.close() before clearing state
  • Backward compatible: callers can still fire-and-forget (no signature change needed at call sites)

Changes

  • extensions/feishu/src/monitor.state.ts: 22 lines added, 5 removed
  • extensions/feishu/src/monitor.transport.ts: 5 lines added, 6 removed

Fixes #48183

stopFeishuMonitorState() calls server.close() then immediately
deletes the server reference from the httpServers Map. Since
server.close() is asynchronous, the port may still be bound when
the reference is removed — causing EADDRINUSE on restart and
leaking server objects in memory.

Await server.close() with a 5-second timeout before clearing
Map entries. Also call closeAllConnections() (Node 18.2+) to
force-close keep-alive connections.

Fixes openclaw#48183
@openclaw-barnacle openclaw-barnacle Bot added channel: feishu Channel integration: feishu size: XS labels Mar 17, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 17, 2026

Greptile Summary

This PR fixes a real and well-diagnosed race condition: stopFeishuMonitorState() previously deleted HTTP server references from the httpServers Map before server.close() finished, causing EADDRINUSE on restart and memory leaks. The fix in monitor.state.ts is solid — the new closeHttpServer() helper combines closeAllConnections() with a 5-second Promise.race timeout, and the function is correctly made async to sequence state cleanup after server teardown.

However, monitor.transport.ts re-implements the same close logic in its local cleanup() without the timeout guard, and the void cleanup().then(() => resolve()) pattern silently discards any rejection. Key concerns:

  • Missing timeout in monitor.transport.ts cleanup — on Node.js < 18.2 with persistent connections, server.close() never invokes its callback, causing monitorWebhook()'s outer Promise to hang forever.
  • Unhandled rejectionsvoid cleanup().then(() => resolve()) (lines 243 and 247) swallows errors; reject should be passed as the second argument to .then().
  • stopFeishuMonitor discards the async result — the public wrapper in monitor.ts still returns void, meaning callers that fire-and-forget could still hit the original EADDRINUSE bug. Returning the Promise allows callers to opt in to awaiting shutdown.

Confidence Score: 2/5

  • The fix correctly addresses the root cause in monitor.state.ts, but monitor.transport.ts introduces a potential indefinite hang on cleanup and silently drops errors.
  • The state module changes are solid, but the transport cleanup is missing the timeout guard present in the new helper — on older Node.js or with persistent connections, the monitorWebhook Promise can hang forever. The void-then pattern also risks swallowing rejections permanently. These are genuine correctness issues that should be fixed before merging.
  • extensions/feishu/src/monitor.transport.ts — cleanup() lacks timeout protection and swallows rejections. extensions/feishu/src/monitor.ts — stopFeishuMonitor() discards the async result.

Comments Outside Diff (1)

  1. extensions/feishu/src/monitor.ts, line 93-95 (link)

    Public wrapper discards the async result

    stopFeishuMonitorState is now async, but stopFeishuMonitor drops the returned Promise. Any caller that invokes stopFeishuMonitor() and then immediately restarts the monitor could still hit EADDRINUSE because the port is freed asynchronously but nothing awaits it.

    Making the wrapper async (or at least returning the Promise) lets callers opt in to awaiting the shutdown:

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: extensions/feishu/src/monitor.ts
    Line: 93-95
    
    Comment:
    **Public wrapper discards the async result**
    
    `stopFeishuMonitorState` is now `async`, but `stopFeishuMonitor` drops the returned Promise. Any caller that invokes `stopFeishuMonitor()` and then immediately restarts the monitor could still hit `EADDRINUSE` because the port is freed asynchronously but nothing awaits it.
    
    Making the wrapper async (or at least returning the Promise) lets callers opt in to awaiting the shutdown:
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 233-239

Comment:
**Cleanup can hang indefinitely — missing timeout**

`closeHttpServer()` in `monitor.state.ts` protects against a stuck server with a 5-second `Promise.race` timeout, but the `cleanup()` function here skips that guard. If `closeAllConnections` is unavailable (Node.js < 18.2) and there are persistent keep-alive connections, `server.close()` will never invoke its callback — it only stops *accepting* new connections, not closing existing ones. That means `cleanup()` hangs forever, `void cleanup().then(() => resolve())` never calls `resolve()`, and `monitorWebhook()`'s outer Promise never settles.

The simplest fix is to export `closeHttpServer` from `monitor.state.ts` and reuse it here, so both paths share the same timeout behaviour:

```
// In monitor.state.ts — add `export` keyword
export function closeHttpServer(server: http.Server): Promise<void> { ... }
```

```suggestion
    const cleanup = async () => {
      await closeHttpServer(server);
      httpServers.delete(accountId);
      botOpenIds.delete(accountId);
      botNames.delete(accountId);
    };
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 243-248

Comment:
**Cleanup rejections silently swallowed**

Using `void cleanup().then(() => resolve())` discards any rejection from `cleanup()`. If cleanup ever rejects (e.g., due to a future code change that can throw), the outer `monitorWebhook()` Promise will hang forever — `resolve()` is never called and `reject()` is never invoked.

Passing `reject` as the second argument to `.then()` ensures the outer Promise settles even on error:

```suggestion
      void cleanup().then(() => resolve(), reject);
```

And similarly for the already-aborted guard a few lines below:
```suggestion
      void cleanup().then(() => resolve(), reject);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/feishu/src/monitor.ts
Line: 93-95

Comment:
**Public wrapper discards the async result**

`stopFeishuMonitorState` is now `async`, but `stopFeishuMonitor` drops the returned Promise. Any caller that invokes `stopFeishuMonitor()` and then immediately restarts the monitor could still hit `EADDRINUSE` because the port is freed asynchronously but nothing awaits it.

Making the wrapper async (or at least returning the Promise) lets callers opt in to awaiting the shutdown:

```suggestion
export function stopFeishuMonitor(accountId?: string): Promise<void> {
  return stopFeishuMonitorState(accountId);
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 8fc7e52

Comment on lines +233 to 239
const cleanup = async () => {
(server.closeAllConnections as (() => void) | undefined)?.();
await new Promise<void>((cb) => server.close(() => cb()));
httpServers.delete(accountId);
botOpenIds.delete(accountId);
botNames.delete(accountId);
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup can hang indefinitely — missing timeout

closeHttpServer() in monitor.state.ts protects against a stuck server with a 5-second Promise.race timeout, but the cleanup() function here skips that guard. If closeAllConnections is unavailable (Node.js < 18.2) and there are persistent keep-alive connections, server.close() will never invoke its callback — it only stops accepting new connections, not closing existing ones. That means cleanup() hangs forever, void cleanup().then(() => resolve()) never calls resolve(), and monitorWebhook()'s outer Promise never settles.

The simplest fix is to export closeHttpServer from monitor.state.ts and reuse it here, so both paths share the same timeout behaviour:

// In monitor.state.ts — add `export` keyword
export function closeHttpServer(server: http.Server): Promise<void> { ... }
Suggested change
const cleanup = async () => {
(server.closeAllConnections as (() => void) | undefined)?.();
await new Promise<void>((cb) => server.close(() => cb()));
httpServers.delete(accountId);
botOpenIds.delete(accountId);
botNames.delete(accountId);
};
const cleanup = async () => {
await closeHttpServer(server);
httpServers.delete(accountId);
botOpenIds.delete(accountId);
botNames.delete(accountId);
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 233-239

Comment:
**Cleanup can hang indefinitely — missing timeout**

`closeHttpServer()` in `monitor.state.ts` protects against a stuck server with a 5-second `Promise.race` timeout, but the `cleanup()` function here skips that guard. If `closeAllConnections` is unavailable (Node.js < 18.2) and there are persistent keep-alive connections, `server.close()` will never invoke its callback — it only stops *accepting* new connections, not closing existing ones. That means `cleanup()` hangs forever, `void cleanup().then(() => resolve())` never calls `resolve()`, and `monitorWebhook()`'s outer Promise never settles.

The simplest fix is to export `closeHttpServer` from `monitor.state.ts` and reuse it here, so both paths share the same timeout behaviour:

```
// In monitor.state.ts — add `export` keyword
export function closeHttpServer(server: http.Server): Promise<void> { ... }
```

```suggestion
    const cleanup = async () => {
      await closeHttpServer(server);
      httpServers.delete(accountId);
      botOpenIds.delete(accountId);
      botNames.delete(accountId);
    };
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 243 to 248
void cleanup().then(() => resolve());
};

if (abortSignal?.aborted) {
cleanup();
resolve();
void cleanup().then(() => resolve());
return;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup rejections silently swallowed

Using void cleanup().then(() => resolve()) discards any rejection from cleanup(). If cleanup ever rejects (e.g., due to a future code change that can throw), the outer monitorWebhook() Promise will hang forever — resolve() is never called and reject() is never invoked.

Passing reject as the second argument to .then() ensures the outer Promise settles even on error:

Suggested change
void cleanup().then(() => resolve());
};
if (abortSignal?.aborted) {
cleanup();
resolve();
void cleanup().then(() => resolve());
return;
void cleanup().then(() => resolve(), reject);

And similarly for the already-aborted guard a few lines below:

Suggested change
void cleanup().then(() => resolve());
};
if (abortSignal?.aborted) {
cleanup();
resolve();
void cleanup().then(() => resolve());
return;
void cleanup().then(() => resolve(), reject);
Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/feishu/src/monitor.transport.ts
Line: 243-248

Comment:
**Cleanup rejections silently swallowed**

Using `void cleanup().then(() => resolve())` discards any rejection from `cleanup()`. If cleanup ever rejects (e.g., due to a future code change that can throw), the outer `monitorWebhook()` Promise will hang forever — `resolve()` is never called and `reject()` is never invoked.

Passing `reject` as the second argument to `.then()` ensures the outer Promise settles even on error:

```suggestion
      void cleanup().then(() => resolve(), reject);
```

And similarly for the already-aborted guard a few lines below:
```suggestion
      void cleanup().then(() => resolve(), reject);
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8fc7e52923

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/feishu/src/monitor.state.ts Outdated
Comment on lines +143 to +147
new Promise<void>((resolve) => {
setTimeout(() => {
console.warn("feishu: HTTP server close timed out, continuing cleanup");
resolve();
}, SERVER_CLOSE_TIMEOUT_MS);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Cancel server-close fallback timer after successful close

The new closeHttpServer timeout branch is never canceled when server.close() finishes first, so every normal shutdown still leaves a live 5s timer that keeps the event loop open and eventually prints a false timeout warning. In practice, stopping Feishu monitoring now adds an avoidable shutdown delay/noise on every close path. Please clear the timeout when the close callback runs (or otherwise make the fallback timer non-blocking).

Useful? React with 👍 / 👎.

@alex-xuweilong
Copy link
Copy Markdown
Author

Thanks for the thorough reviews! All three issues addressed:

  1. Exported closeHttpServer and reused it in monitor.transport.ts — no more duplicated close logic
  2. Added clearTimeout(timer) on successful close to prevent stale 5s timers
  3. Changed both .then(() => resolve()) to .then(() => resolve(), reject) to propagate cleanup errors

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 284336a893

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread extensions/feishu/src/monitor.state.ts Outdated
resolve();
}, SERVER_CLOSE_TIMEOUT_MS);

(server.closeAllConnections as (() => void) | undefined)?.();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Call server.close() before destroying all connections

closeHttpServer currently invokes closeAllConnections() before server.close(), which leaves a race window where new sockets can still be accepted between those two calls. Under shutdown traffic, that can keep server.close() from completing until the 5s timeout path resolves, reintroducing the same premature-cleanup behavior this patch is trying to eliminate (state is cleared while the port may still be bound).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — reordered to call server.close() first (stop accepting new connections), then closeAllConnections() (destroy existing sockets). This eliminates the race window where new sockets could be accepted during shutdown.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: needs changes before merge.

Summary
The PR adds a shared Feishu HTTP server close helper and awaits it before deleting Feishu monitor state in stop and webhook-abort cleanup paths.

Reproducibility: yes. at source level: current main calls server.close() and clears Feishu HTTP server state before the close callback in both stop and webhook-abort cleanup paths. I did not run a live restart loop in this read-only pass.

Next step before merge
A repair worker can add the missing changelog line without product, ownership, or security judgment.

Security
Cleared: The diff is limited to Feishu HTTP server shutdown sequencing and does not change dependencies, workflows, package resolution, secrets handling, permissions, or artifact execution.

Review findings

  • [P3] Add the required changelog entry — extensions/feishu/src/monitor.state.ts:146
Review details

Best possible solution:

Land this PR or a maintainer-owned equivalent after adding the required changelog entry and running merge-preview validation; keep the broader ResourceManager proposal in #48793 separate.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main calls server.close() and clears Feishu HTTP server state before the close callback in both stop and webhook-abort cleanup paths. I did not run a live restart loop in this read-only pass.

Is this the best way to solve the issue?

Yes. A Feishu-owned awaitable HTTP close helper shared by state cleanup and webhook abort cleanup is the narrowest maintainable fix; the remaining repair is the required changelog entry.

Full review comments:

  • [P3] Add the required changelog entry — extensions/feishu/src/monitor.state.ts:146
    This is a user-facing Feishu fix, but the PR changes production monitor code without updating CHANGELOG.md. Add one single-line Unreleased Fixes entry referencing this PR and crediting the human contributor before merge.
    Confidence: 0.9

Overall correctness: patch is correct
Overall confidence: 0.82

Acceptance criteria:

  • git diff --check
  • node scripts/check-changelog-attributions.mjs CHANGELOG.md
  • If implementation files are touched, run: pnpm test extensions/feishu/src/monitor.cleanup.test.ts extensions/feishu/src/monitor.webhook-e2e.test.ts extensions/feishu/src/monitor.webhook-security.test.ts

What I checked:

Likely related people:

  • steipete: Feature history connects this maintainer to the split Feishu monitor files and nearby webhook/body handling around this cleanup path. (role: recent maintainer and major Feishu monitor refactor owner; confidence: high; commits: f46bd2e0cc17, 8be3a4466cbc, 164f0feddff5; files: extensions/feishu/src/monitor.ts, extensions/feishu/src/monitor.state.ts, extensions/feishu/src/monitor.transport.ts)
  • vincentkoc: Recent Feishu state seam, runtime helper, and reconnect lifecycle commits touch adjacent monitor ownership and cleanup behavior. (role: recent Feishu lifecycle and state-seam maintainer; confidence: high; commits: 2fff6be4c393, 319aa2f1fe67, d93e6f61580a; files: extensions/feishu/src/monitor.state.ts, extensions/feishu/src/monitor.transport.ts, extensions/feishu/src/monitor.cleanup.test.ts)

Remaining risk / open question:

  • No live restart loop or targeted Vitest run was executed in this read-only review, so merge still needs focused Feishu cleanup/webhook validation.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 89db1e5440f5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: feishu Channel integration: feishu size: XS triage: refactor-only Candidate: refactor/cleanup-only PR without maintainer context.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Feishu monitor state cleanup incomplete - potential memory leak in httpServers Map

2 participants