fix(core): drop disabled MCP server from health status registry#3916
Conversation
The global `serverStatuses` map only added entries — disabling a server via `/mcp` left its `DISCONNECTED` entry in place, so the Footer's MCP health pill kept counting the server as offline. Add `removeMCPServerStatus` and call it from `disableMcpServer` so the server drops out of the health snapshot. `disconnectServer` (used for transient reconnects like OAuth and health-check retries) keeps the entry as before. Fixes #3895
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Test (macos-latest, 24.x) is failing (pre-existing, unrelated to this PR).
Additional findings (not mappable to specific diff lines)
[Critical] packages/cli/src/ui/hooks/useMCPHealth.ts — 核心 UI 修复路径缺少测试
该 hook 的 undefined → delete 分支是本次 bug 修复的核心 UI 代码路径,但该 hook 完全没有测试文件。同目录下其他所有 hook 都有对应测试。如果未来有人重构 listener 逻辑,undefined 分支可能被意外移除且无测试捕获。
建议:新增 useMCPHealth.test.ts,使用 renderHook 覆盖 listener 收到 undefined 时删除条目、收到正常状态时更新、组件卸载时移除 listener。
[Suggestion] packages/core/src/tools/tool-registry.ts:347-355 — disableMcpServer 与 disconnectServer 存在代码重复
前 8 行(remove tools、remove prompts、disconnect client)与 disconnectServer 完全重复。建议让 disableMcpServer 调用 this.disconnectServer(serverName) 然后追加 removeMCPServerStatus 和排除列表更新。
[Suggestion] packages/core/src/tools/tool-registry.ts:331-339 — disconnectServer() 未调用 removeMCPServerStatus()
只有 disableMcpServer 获得了状态注册表清理,但 OAuth 令牌清除流程使用的 disconnectServer 仍会在全局状态映射中留下 DISCONNECTED 条目。
[Suggestion] packages/core/src/tools/mcp-client.ts:333-335 — getMCPServerStatus 语义模糊
对未追踪的服务器返回 DISCONNECTED,无法区分「被禁用/移除」和「连接失败」。PR 新增 removeMCPServerStatus 后更加容易触发。建议将返回类型改为 MCPServerStatus | undefined,或新增 isMCPServerTracked() 函数。
— deepseek-v4-pro via Qwen Code /review
|
|
||
| /** | ||
| * Remove an MCP server from the status registry and notify listeners. | ||
| * Used when a server is disabled or removed from configuration so it no |
There was a problem hiding this comment.
[Critical] removeMCPServerStatus 能正确移除条目,但 updateMCPServerStatus(第 304 行)缺少对应的守卫逻辑。任何 removeMCPServerStatus 之后的异步回调都会通过 updateMCPServerStatus 将服务器静默「复活」回注册表。
最具体的触发场景:McpClient.connect() 的 catch 块(第 164 行)无条件调用 this.updateStatus(DISCONNECTED),不检查 isDisconnecting 标志。若在 connect() 执行期间调用 disableMcpServer,已移除的服务器会被重新加入。
| * Used when a server is disabled or removed from configuration so it no | |
| export function removeMCPServerStatus(serverName: string): void { | |
| if (!serverStatuses.has(serverName)) { | |
| return; | |
| } | |
| const previousStatus = serverStatuses.get(serverName); | |
| debugLogger.debug( | |
| `MCP status registry: removing '${serverName}' (was ${previousStatus})`, | |
| ); | |
| serverStatuses.delete(serverName); | |
| for (const listener of [...statusChangeListeners]) { | |
| listener(serverName, undefined); | |
| } | |
| } |
同时在 updateMCPServerStatus 中添加守卫:
| * Used when a server is disabled or removed from configuration so it no | |
| export function updateMCPServerStatus( | |
| serverName: string, | |
| status: MCPServerStatus, | |
| ): void { | |
| if (!serverStatuses.has(serverName)) return; | |
| serverStatuses.set(serverName, status); | |
| for (const listener of [...statusChangeListeners]) { | |
| listener(serverName, status); | |
| } | |
| } |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch on the race — fixed in 802cd29.
A small caveat on the proposed if (!serverStatuses.has(serverName)) return; guard inside updateMCPServerStatus: it would also block the initial registration (the very first updateStatus(CONNECTING) runs before any entry exists), so the global function can't gate on has() directly. Instead I gated at the source — McpClient.updateStatus now bails out when isDisconnecting is set, so the in-flight connect() catch block can no longer leak DISCONNECTED into the registry after disableMcpServer has already removed the entry.
Also snapshotted the listener array ([...statusChangeListeners]) in both updateMCPServerStatus and removeMCPServerStatus per your suggestion, so a listener detaching itself during dispatch can't mutate the iteration.
Added a regression test (a stale status update from an in-flight connect cannot resurrect a removed server) that drives exactly this race.
| // Drop the server from the global status registry so the Footer's | ||
| // MCP health pill doesn't keep counting it as "offline" — disabling | ||
| // is intentional, not a connectivity failure. | ||
| removeMCPServerStatus(serverName); |
There was a problem hiding this comment.
[Suggestion] 如果 disconnectServer 在上方第 355 行抛出异常,此处的 removeMCPServerStatus 永远不会被调用,服务器残留于注册表中。
| removeMCPServerStatus(serverName); | |
| try { | |
| await this.mcpClientManager.disconnectServer(serverName); | |
| } finally { | |
| // Always clean up status registry even if disconnect fails | |
| removeMCPServerStatus(serverName); | |
| } |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Done in 802cd29 — wrapped the disconnect call in try { ... } finally { removeMCPServerStatus(serverName); }. Added a test (still removes the registry entry when disconnect throws) that mocks disconnectServer to reject and asserts the global entry is still cleared.
Address review feedback on #3916: - McpClient.updateStatus now bails out when isDisconnecting is set, so an in-flight connect()'s catch block can no longer resurrect a server that disableMcpServer already removed from the global registry. - disableMcpServer wraps the disconnect call in try/finally so the status entry is cleared even if disconnect throws. - McpClientManager.removeServer also calls removeMCPServerStatus for symmetry — a server removed from configuration should drop out of the health snapshot the same way a disabled one does. - updateMCPServerStatus / removeMCPServerStatus iterate over a snapshot of statusChangeListeners so a listener detaching itself during dispatch doesn't mutate the array we're iterating. Adds two regression tests: one drives the connect/disconnect race and asserts the entry stays removed; another verifies disableMcpServer clears the entry even when disconnect rejects.
自审报告(6 轮无方向审计)针对评审反馈修复后,我对设计和代码做了 6 轮独立审计,未发现遗留问题。下面是逐轮记录。 Round 1 — 架构设计
Round 2 — 边界条件
Round 3 — 逐文件审读
Round 4 — 竞态推演重点验证 disable 期间 in-flight
Round 5 — 测试运行69 个相关测试全部通过:
新增的两条回归测试覆盖关键不变量:
Round 6 — 破坏性变更
权衡后接受此 API 变化(语义上必要)。 结论设计与实现稳定,未发现需进一步修改的问题。 |
| // disconnect throws — so the Footer's MCP health pill stops counting | ||
| // it as "offline". Disabling is intentional, not a connectivity | ||
| // failure, and a leftover entry would resurrect the bug from #3895. | ||
| removeMCPServerStatus(serverName); |
There was a problem hiding this comment.
[Suggestion] removeMCPServerStatus 在 setExcludedMcpServers 之前调用,存在短暂不一致窗口。
在这两个同步调用之间,getMCPServerStatus(serverName) 会回退返回 DISCONNECTED,但 config.isMcpServerDisabled(serverName) 返回 false(排除列表尚未更新)。doctorChecks.ts 在此期间会将一个正在被主动禁用的服务器误判为 fail("disconnected"),而非 pass("disabled")。
当前影响极小(同步窗口),但如果未来有人在两个调用之间插入 await,窗口会扩大。
| removeMCPServerStatus(serverName); | |
| // Update config's exclusion list first so the server is already | |
| // marked as disabled before we drop it from the status registry. | |
| const currentExcluded = this.config.getExcludedMcpServers() || []; | |
| if (!currentExcluded.includes(serverName)) { | |
| this.config.setExcludedMcpServers([...currentExcluded, serverName]); | |
| } | |
| removeMCPServerStatus(serverName); |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good catch on the ordering — fixed in ab4f6d0.
Moved setExcludedMcpServers to run before removeMCPServerStatus, both inside the finally block. So a server is added to the exclusion list (and isMcpServerDisabled flips to true) before its status entry disappears, closing the window where doctorChecks could observe the fallback DISCONNECTED while the exclusion list was still stale. As you noted the window is currently synchronous, but this also makes it robust to future awaits landing between the two calls.
Side effect of moving the exclusion-list update into finally: if disconnectServer throws, the server now still ends up in the exclusion list. That seems correct semantically — once the user has chosen to disable, we shouldn't re-discover the server on next refresh just because the transport teardown errored.
Added a regression test (updates the exclusion list before dropping the status entry) that captures getAllMCPServerStatuses().has(serverName) from inside the setExcludedMcpServers spy, asserting the status entry is still present at that point.
Move setExcludedMcpServers ahead of removeMCPServerStatus inside the finally block, so a disabled server is marked excluded before its status entry is dropped. Closes the (currently synchronous) window where doctorChecks would observe a missing status while the exclusion list still hadn't been updated, mis-reporting an intentional disable as a connectivity failure.
DemoCase 15.8.mp4Case 25.8.1.mp4 |
|
@wenshao 已在评论区补充了视频 demo 复现「禁用 MCP 服务器后健康提示药丸正确清空」,麻烦再 review 一下,谢谢! |
|
|
||
| // Even though disconnect threw, the global status entry must be cleared | ||
| // so the health pill stops counting the server. | ||
| expect(getAllMCPServerStatuses().has('flaky-server')).toBe(false); |
There was a problem hiding this comment.
[Suggestion] 错误路径测试 mock 了 setExcludedMcpServers,但未断言它被调用过。该 PR 将排除列表更新移入了 finally 块——现在即使 disconnectServer 抛出异常它也会执行。测试应验证这一行为变更。
| expect(getAllMCPServerStatuses().has('flaky-server')).toBe(false); | |
| expect(getAllMCPServerStatuses().has('flaky-server')).toBe(false); | |
| expect(setExcludedSpy).toHaveBeenCalledWith(['flaky-server']); |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Done in a384bc3. Test renamed to still removes the registry entry and updates the exclusion list when disconnect throws, with the added expect(setExcludedSpy).toHaveBeenCalledWith(['flaky-server']) assertion you suggested. This locks in the new finally-block behavior (exclusion list is always updated, even on disconnect failure) so we won't silently regress it.
| // isMcpServerDisabled still returns false, mis-reporting an | ||
| // intentional disable as a connectivity failure. | ||
| const currentExcluded = this.config.getExcludedMcpServers() || []; | ||
| if (!currentExcluded.includes(serverName)) { |
There was a problem hiding this comment.
[Suggestion] finally 块中 setExcludedMcpServers 缺乏防御性错误处理。若 getExcludedMcpServers() 或 setExcludedMcpServers() 抛出异常,removeMCPServerStatus 永远不会执行,导致全局状态注册表中残留陈旧条目。虽然当前实现是同步纯内存操作,但注释已承认该窗口「容易扩大」。
| if (!currentExcluded.includes(serverName)) { | |
| } finally { | |
| try { | |
| const currentExcluded = this.config.getExcludedMcpServers() || []; | |
| if (!currentExcluded.includes(serverName)) { | |
| this.config.setExcludedMcpServers([...currentExcluded, serverName]); | |
| } | |
| } finally { | |
| removeMCPServerStatus(serverName); | |
| } | |
| } |
— deepseek-v4-pro via Qwen Code /review
There was a problem hiding this comment.
Good point — fixed in a384bc3.
Wrapped the exclusion-list update in a nested try { ... } finally { removeMCPServerStatus(serverName); } exactly as you suggested. Now even if a future getExcludedMcpServers/setExcludedMcpServers impl throws, the status registry still gets cleaned up so the health pill can't be left with a stale entry.
Added a regression test (still removes the registry entry when the exclusion-list update throws) that mocks setExcludedMcpServers to throw and asserts the global entry is still cleared.
Updated the comment in the inner finally to explicitly call out that we recover from both disconnect failures and exclusion-list update failures.
Wrap setExcludedMcpServers in a nested try/finally inside disableMcpServer so removeMCPServerStatus still runs if the config write throws. Otherwise a future config impl that fails on write would leave a stale entry in the global status registry, resurrecting the #3895 health-pill bug. Also tighten the existing throw-path test to assert the exclusion list is still updated when disconnect throws (now that the update lives in the finally block), and add a new test for the exclusion-update-throws case.
| // after `disableMcpServer` has already removed the entry would otherwise | ||
| // silently resurrect the server and the Footer's MCP health pill would | ||
| // continue to count it as offline. | ||
| if (this.isDisconnecting) { |
There was a problem hiding this comment.
[Suggestion] The isDisconnecting guard here (added to prevent the connect() catch block from resurrecting removed entries) is too broad — it also silences disconnect()'s own DISCONNECTED status propagation (line 195).
disconnect() sets isDisconnecting = true on line 190, then calls this.updateStatus(DISCONNECTED) on line 195. The guard at line 232 returns early, so updateMCPServerStatus() is never called. This means any code path that calls disconnect() without a subsequent removeMCPServerStatus() or connect() will leave a stale entry in the global status registry. Currently, McpClientManager.disconnectServer() (line 269 of mcp-client-manager.ts) is the affected caller — used by MCPManagementDialog to clear OAuth tokens.
| if (this.isDisconnecting) { | |
| if (this.isDisconnecting) { | |
| return; | |
| } | |
| updateMCPServerStatus(this.serverName, status); | |
| } | |
| /** | |
| * Disconnects from the MCP server. | |
| */ | |
| async disconnect(): Promise<void> { | |
| this.isDisconnecting = true; | |
| if (this.transport) { | |
| await this.transport.close(); | |
| } | |
| this.client.close(); | |
| this.status = MCPServerStatus.DISCONNECTED; | |
| // Call the module-level function directly so the global status | |
| // registry always reflects the disconnect, regardless of the | |
| // isDisconnecting guard (which exists solely to block stale | |
| // connect() catch blocks from resurrecting removed entries). | |
| updateMCPServerStatus(this.serverName, MCPServerStatus.DISCONNECTED); |
— DeepSeek/deepseek-v4-pro via Qwen Code /review
Summary
/mcp, the Footer's MCP health pill kept showing "1 MCP offline" becauseserverStatusesinmcp-client.tsonly added entries, never removed them.removeMCPServerStatus(serverName)and calls it fromToolRegistry.disableMcpServerso the server drops out of the health snapshot.disconnectServer(used for transient reconnects — OAuth, health-check retries) keeps the entry as before.useMCPHealthlistener now treats anundefinedstatus as a removal signal and deletes the local entry.Test plan
vitest runforpackages/core/src/tools/mcp-client.test.ts— added 3 cases forremoveMCPServerStatus(removes from map, notifies listeners withundefined, no-op when untracked).vitest runforpackages/core/src/tools/tool-registry.test.ts— added a case verifyingdisableMcpServerclears the registry entry and updates the exclusion list.vitest runforpackages/cli/src/ui/components/mcp/MCPHealthPill.test.tsxandpackages/core/src/tools/mcp-client-manager.test.ts— no regressions./mcpdisable → pill clears.