fix(gateway): pin plugin webhook route registry#47902
Conversation
PR SummaryMedium Risk Overview Adds lifecycle management to release the pinned registry on shutdown (including error paths), and updates Written by Cursor Bugbot for commit c336a20. This will update automatically on new commits. Configure here. |
Greptile SummaryThis PR fixes a plugin HTTP route desynchronization bug where runtime-registered webhook routes (e.g. BlueBubbles) could end up in a different Key changes:
Two minor style observations are noted inline in Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/plugins/runtime.ts
Line: 80-91
Comment:
**Implicit route-count heuristic silently drops fallback routes**
`resolveActivePluginHttpRouteRegistry` returns the pinned registry whenever it has **any** routes (`routeCount > 0`), completely ignoring routes that may exist in `fallback`. In the current production call-sites both arguments point to the same `pluginRegistry` object, so this is harmless. However, if a future caller passes a distinct `fallback` that has routes the pinned registry doesn't own, those routes will be silently unreachable once the pinned registry accumulates even a single entry.
Adding a brief doc comment explaining the intended tie-breaking rule (pinned wins when non-empty, fallback wins only when pinned is empty) would make the invariant explicit and easier to verify:
```suggestion
/**
* Returns the pinned HTTP-route registry when it owns at least one route;
* otherwise falls back to the provided registry. Both the auth-preflight
* and the dispatch path must pass the same `fallback` object so that the
* two code paths always agree on which registry to consult.
*/
export function resolveActivePluginHttpRouteRegistry(fallback: PluginRegistry): PluginRegistry {
const routeRegistry = getActivePluginHttpRouteRegistry();
if (!routeRegistry) {
return fallback;
}
const routeCount = routeRegistry.httpRoutes?.length ?? 0;
const fallbackRouteCount = fallback.httpRoutes?.length ?? 0;
if (routeCount === 0 && fallbackRouteCount > 0) {
return fallback;
}
return routeRegistry;
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/plugins/runtime.ts
Line: 66-68
Comment:
**Dead `!routeRegistry` branch in `getActivePluginHttpRouteRegistry`**
`state.httpRouteRegistry ?? state.registry` can only return `null` when both fields are `null`. `state.registry` is always initialized to `createEmptyPluginRegistry()` at module load and `setActivePluginRegistry` only accepts a non-nullable `PluginRegistry`, so this function will never return `null`. The callers that check `if (!routeRegistry)` (e.g. `resolveActivePluginHttpRouteRegistry` line 82) therefore have unreachable branches. This is minor, but updating the return type to `PluginRegistry` (dropping `| null`) would make the guarantee explicit and let TypeScript remove those dead checks upstream.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: c336a20 |
| export function resolveActivePluginHttpRouteRegistry(fallback: PluginRegistry): PluginRegistry { | ||
| const routeRegistry = getActivePluginHttpRouteRegistry(); | ||
| if (!routeRegistry) { | ||
| return fallback; | ||
| } | ||
| const routeCount = routeRegistry.httpRoutes?.length ?? 0; | ||
| const fallbackRouteCount = fallback.httpRoutes?.length ?? 0; | ||
| if (routeCount === 0 && fallbackRouteCount > 0) { | ||
| return fallback; | ||
| } | ||
| return routeRegistry; | ||
| } |
There was a problem hiding this comment.
Implicit route-count heuristic silently drops fallback routes
resolveActivePluginHttpRouteRegistry returns the pinned registry whenever it has any routes (routeCount > 0), completely ignoring routes that may exist in fallback. In the current production call-sites both arguments point to the same pluginRegistry object, so this is harmless. However, if a future caller passes a distinct fallback that has routes the pinned registry doesn't own, those routes will be silently unreachable once the pinned registry accumulates even a single entry.
Adding a brief doc comment explaining the intended tie-breaking rule (pinned wins when non-empty, fallback wins only when pinned is empty) would make the invariant explicit and easier to verify:
| export function resolveActivePluginHttpRouteRegistry(fallback: PluginRegistry): PluginRegistry { | |
| const routeRegistry = getActivePluginHttpRouteRegistry(); | |
| if (!routeRegistry) { | |
| return fallback; | |
| } | |
| const routeCount = routeRegistry.httpRoutes?.length ?? 0; | |
| const fallbackRouteCount = fallback.httpRoutes?.length ?? 0; | |
| if (routeCount === 0 && fallbackRouteCount > 0) { | |
| return fallback; | |
| } | |
| return routeRegistry; | |
| } | |
| /** | |
| * Returns the pinned HTTP-route registry when it owns at least one route; | |
| * otherwise falls back to the provided registry. Both the auth-preflight | |
| * and the dispatch path must pass the same `fallback` object so that the | |
| * two code paths always agree on which registry to consult. | |
| */ | |
| export function resolveActivePluginHttpRouteRegistry(fallback: PluginRegistry): PluginRegistry { | |
| const routeRegistry = getActivePluginHttpRouteRegistry(); | |
| if (!routeRegistry) { | |
| return fallback; | |
| } | |
| const routeCount = routeRegistry.httpRoutes?.length ?? 0; | |
| const fallbackRouteCount = fallback.httpRoutes?.length ?? 0; | |
| if (routeCount === 0 && fallbackRouteCount > 0) { | |
| return fallback; | |
| } | |
| return routeRegistry; | |
| } |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/runtime.ts
Line: 80-91
Comment:
**Implicit route-count heuristic silently drops fallback routes**
`resolveActivePluginHttpRouteRegistry` returns the pinned registry whenever it has **any** routes (`routeCount > 0`), completely ignoring routes that may exist in `fallback`. In the current production call-sites both arguments point to the same `pluginRegistry` object, so this is harmless. However, if a future caller passes a distinct `fallback` that has routes the pinned registry doesn't own, those routes will be silently unreachable once the pinned registry accumulates even a single entry.
Adding a brief doc comment explaining the intended tie-breaking rule (pinned wins when non-empty, fallback wins only when pinned is empty) would make the invariant explicit and easier to verify:
```suggestion
/**
* Returns the pinned HTTP-route registry when it owns at least one route;
* otherwise falls back to the provided registry. Both the auth-preflight
* and the dispatch path must pass the same `fallback` object so that the
* two code paths always agree on which registry to consult.
*/
export function resolveActivePluginHttpRouteRegistry(fallback: PluginRegistry): PluginRegistry {
const routeRegistry = getActivePluginHttpRouteRegistry();
if (!routeRegistry) {
return fallback;
}
const routeCount = routeRegistry.httpRoutes?.length ?? 0;
const fallbackRouteCount = fallback.httpRoutes?.length ?? 0;
if (routeCount === 0 && fallbackRouteCount > 0) {
return fallback;
}
return routeRegistry;
}
```
How can I resolve this? If you propose a fix, please make it concise.| export function getActivePluginHttpRouteRegistry(): PluginRegistry | null { | ||
| return state.httpRouteRegistry ?? state.registry; | ||
| } |
There was a problem hiding this comment.
Dead !routeRegistry branch in getActivePluginHttpRouteRegistry
state.httpRouteRegistry ?? state.registry can only return null when both fields are null. state.registry is always initialized to createEmptyPluginRegistry() at module load and setActivePluginRegistry only accepts a non-nullable PluginRegistry, so this function will never return null. The callers that check if (!routeRegistry) (e.g. resolveActivePluginHttpRouteRegistry line 82) therefore have unreachable branches. This is minor, but updating the return type to PluginRegistry (dropping | null) would make the guarantee explicit and let TypeScript remove those dead checks upstream.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/runtime.ts
Line: 66-68
Comment:
**Dead `!routeRegistry` branch in `getActivePluginHttpRouteRegistry`**
`state.httpRouteRegistry ?? state.registry` can only return `null` when both fields are `null`. `state.registry` is always initialized to `createEmptyPluginRegistry()` at module load and `setActivePluginRegistry` only accepts a non-nullable `PluginRegistry`, so this function will never return `null`. The callers that check `if (!routeRegistry)` (e.g. `resolveActivePluginHttpRouteRegistry` line 82) therefore have unreachable branches. This is minor, but updating the return type to `PluginRegistry` (dropping `| null`) would make the guarantee explicit and let TypeScript remove those dead checks upstream.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c336a20564
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| canvasHost, | ||
| releasePluginRouteRegistry, | ||
| httpServer, |
There was a problem hiding this comment.
Release pinned route registry on startup failure
createGatewayRuntimeState now pins the process-global HTTP route registry, but releasePluginRouteRegistry is only wired into the later shutdown path. If any awaited startup step after this point fails (for example sidecar/bootstrap work), startGatewayServer rejects before the close handler exists, so the pin is never released and later registerPluginHttpRoute calls in the same process can write to an orphaned registry object.
Useful? React with 👍 / 👎.
(cherry picked from commit a69f619)
…1959) * refactor: share gateway client auth retry helpers (cherry picked from commit 5f34391) * Gateway: preserve discovered session store paths (cherry picked from commit 60c1577) * refactor: share node pending test client (cherry picked from commit 644fb76) * fix: add null guards to usage sort comparators Prevents crash when totals is undefined in byModel/byProvider/byAgent sort comparators. Fixes 'Cannot read properties of undefined (reading totalTokens)' crash that causes context overflow in active sessions. (cherry picked from commit 6921716) * refactor: share gateway credential secretref assertions (cherry picked from commit 6cc86ad) * fix: force-stop lingering gateway client sockets (cherry picked from commit 727fc79) * Gateway: lazily resolve channel runtime (cherry picked from commit 776e5d8) * test: stabilize gateway alias coverage (cherry picked from commit 7b00a06) * Gateway: preserve trusted-proxy browser scopes (cherry picked from commit 8661c27) * test: simplify control ui http coverage (cherry picked from commit 91d4f5c) * test: tighten server method helper coverage (cherry picked from commit 91f1894) * fix(gateway): remove re-introduced auth.mode=none pairing bypass The revert of openclaw#43478 (commit 39b4185) was silently undone by 3704293 which was based on a branch that included the original change. This removes the auth.mode=none skipPairing condition again. The blanket skip was too broad - it disabled pairing for ALL websocket clients, not just Control UI behind reverse proxies. (cherry picked from commit 92fc806) * fix(gateway): avoid probe false negatives after connect (cherry picked from commit 93df5f6) * fix(gateway): skip device pairing when auth.mode=none Fixes openclaw#42931 When gateway.auth.mode is set to "none", authentication succeeds with method "none" but sharedAuthOk remains false because the auth-context only recognises token/password/trusted-proxy methods. This causes all pairing-skip conditions to fail, so Control UI browser connections get closed with code 1008 "pairing required" despite auth being disabled. Short-circuit the skipPairing check: if the operator explicitly disabled authentication, device pairing (which is itself an auth mechanism) must also be bypassed. Fixes openclaw#42931 (cherry picked from commit 9bffa34) * Gateway: cover lazy channel runtime resolution (cherry picked from commit 9ee0fb5) * fix(gateway): propagate real gateway client into plugin subagent runtime Plugin subagent dispatch used a hardcoded synthetic client carrying operator.admin, operator.approvals, and operator.pairing for all runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require no gateway auth by design, so an unauthenticated external request could drive admin-only gateway methods (sessions.delete, agent.run) through the subagent runtime. Propagate the real gateway client into the plugin runtime request scope when one is available. Plugin HTTP routes now run inside a scoped runtime client: auth:"plugin" routes receive a non-admin synthetic operator.write client; gateway-authenticated routes retain admin-capable scopes. The security boundary is enforced at the HTTP handler level. Fixes GHSA-xw77-45gv-p728 (cherry picked from commit a1520d7) * refactor: share control ui hardlink asset setup (cherry picked from commit a3ece09) * test(gateway): avoid hoisted reply mock tdz (cherry picked from commit a60a4b4) * fix(gateway): pin plugin webhook route registry (openclaw#47902) (cherry picked from commit a69f619) * test(gateway): stabilize suite session-store config (openclaw#52193) * test(gateway): stabilize suite session-store config * test(gateway): preserve seeded config semantics * test(gateway): update seeded session store overrides (cherry picked from commit ad24fcc) * test: share plugin http auth helpers (cherry picked from commit b644669) * refactor: deduplicate push test fixtures (cherry picked from commit b6b5e5c) * test: share gateway reload helpers (cherry picked from commit b72ac79) * test: dedupe cron config setup (cherry picked from commit ba34266) * test: simplify talk config and path env coverage (cherry picked from commit bec76be) * refactor: share agent wait dedupe cleanup (cherry picked from commit c889803) * test(gateway): restore agent request route mock (cherry picked from commit ccba943) * fix: add gateway session reset routing coverage (openclaw#44773) (thanks @Lanfei) (cherry picked from commit d40a4e3) * fix(gateway): enforce caller-scope subsetting in device.token.rotate device.token.rotate accepted attacker-controlled scopes and forwarded them to rotateDeviceToken without verifying the caller held those scopes. A pairing-scoped token could rotate up to operator.admin on any already-paired device whose approvedScopes included admin. Add a caller-scope subsetting check before rotateDeviceToken: the requested scopes must be a subset of client.connect.scopes via the existing roleScopesAllow helper. Reject with missing scope: <scope> if not. Also add server.device-token-rotate-authz.test.ts covering both the priv-esc path and the admin-to-node-invoke chain. Fixes GHSA-4jpw-hj22-2xmc (cherry picked from commit dafd61b) * refactor: share readiness test harness (cherry picked from commit db9c755) * test: simplify method scope coverage (cherry picked from commit e1b9250) * refactor: share node wake test apns fixtures (cherry picked from commit e351a86) * refactor: reuse gateway talk provider schema fields (cherry picked from commit e94ac57) * fix(gateway): enforce browser origin check regardless of proxy headers In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false whenever proxy headers were present. This allowed browser-originated WebSocket connections from untrusted origins to bypass origin validation entirely, as the check only ran for control-ui and webchat client types. An attacker serving a page from an untrusted origin could connect through a trusted reverse proxy, inherit proxy-injected identity, and obtain operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity path without any origin restriction. Remove the hasProxyHeaders exemption so origin validation runs for all browser-originated connections regardless of how the request arrived. Fixes GHSA-5wcw-8jjv-m286 (cherry picked from commit ebed3bb) * refactor(security): reuse hook agent routing normalization (cherry picked from commit eece586) * Hardening: tighten preauth WebSocket handshake limits (openclaw#44089) * Gateway: tighten preauth handshake limits * Changelog: note WebSocket preauth hardening * Gateway: count preauth frame bytes accurately * Gateway: cap WebSocket payloads before auth (cherry picked from commit eff0d5a) * test: share gateway chat run helpers (cherry picked from commit f8efa30) * refactor: share shared auth scope assertion (cherry picked from commit feba7ea) * fix: adapt cherry-picks for fork TS strictness and naming * fix: resolve remaining TS errors from cherry-picks * fix: revert push test to fork version — no relay transport in fork * fix: remove duplicate function declarations in cron test * fix: remove TalkSpeak schema refs removed by upstream refactoring * fix: remove remaining TalkSpeak refs after upstream schema refactoring * fix: add missing clampProbeTimeoutMs and fix testCase ref in control-ui test * fix: resolve final TS type errors in cherry-picked tests * fix: use InstanceType for GatewayClient type refs in test * fix: resolve CI lint and test failures (no-explicit-any, hook audit finding ID) --------- Co-authored-by: Peter Steinberger <steipete@gmail.com> Co-authored-by: Gustavo Madeira Santana <gumadeiras@gmail.com> Co-authored-by: Stephen Schoettler <stephenschoettler@gmail.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Andrew Demczuk <andrew.demczuk@gmail.com> Co-authored-by: Robin Waslander <r.waslander@gmail.com> Co-authored-by: Peter Steinberger <peter@steipete.me> Co-authored-by: Luke <92253590+ImLukeF@users.noreply.github.com> Co-authored-by: Ayaan Zaidi <hi@obviy.us>
(cherry picked from commit a69f619)
* fix(agents): normalize windows workspace path boundary checks (openclaw#30766) Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com> (cherry picked from commit ef89b48) * fix: bypass proxy for CDP localhost connections (openclaw#31219) When HTTP_PROXY / HTTPS_PROXY / ALL_PROXY environment variables are set, CDP connections to localhost/127.0.0.1 can be incorrectly routed through the proxy (e.g. via global-agent or undici proxy dispatcher), causing browser control to fail. Fix: - New cdp-proxy-bypass module with utilities for direct localhost connections - WebSocket (ws) CDP connections: pass explicit http.Agent to bypass any global proxy agent patching - fetch-based CDP probes: wrap in withNoProxyForLocalhost() to temporarily set NO_PROXY for the duration of the call - Playwright connectOverCDP: wrap in withNoProxyForLocalhost() since Playwright reads env vars internally - 13 new tests covering getDirectAgentForCdp, hasProxyEnv, and withNoProxyForLocalhost (env save/restore, error recovery) (cherry picked from commit c96234b) * fix(mattermost): pass mediaLocalRoots through reply delivery (openclaw#44021) Merged via squash. Prepared head SHA: 856f11f Co-authored-by: LyleLiu666 <31182860+LyleLiu666@users.noreply.github.com> Co-authored-by: mukhtharcm <56378562+mukhtharcm@users.noreply.github.com> Reviewed-by: @mukhtharcm (cherry picked from commit c965049) * fix(test): reduce startup-heavy hotspot retention (openclaw#52381) Extract applyMSTeamsWebhookTimeouts + constants from monitor.ts into standalone webhook-timeouts.ts module and update test import. Subset of upstream dbd26e4 — fork takes only the msteams-local changes. (cherry picked from commit dbd26e4) * fix(ci): split redact snapshot schema coverage (cherry picked from commit 5841e3b) * fix(logging): make logger import browser-safe (cherry picked from commit df3a190) * fix: resume orphaned subagent sessions after SIGUSR1 reload Closes openclaw#47711 After a SIGUSR1 gateway reload aborts in-flight subagent LLM calls, the gateway now scans for orphaned sessions and sends a synthetic resume message to restart their work. Also makes the deferral timeout configurable via gateway.reload.deferralTimeoutMs (default: 5 minutes, up from 90s). (cherry picked from commit 304703f) * fix(gateway): pin plugin webhook route registry (openclaw#47902) (cherry picked from commit a69f619) * fix: validate edge tts output file is non-empty before reporting success (openclaw#43385) thanks @Huntterxx Merged after review.\n\nSmall, scoped fix: treat 0-byte Edge TTS output as failure so provider fallback can continue. (cherry picked from commit 946c24d) * fix(config): regenerate base schema + add label after deferralTimeoutMs port Follow-up to cherry-pick 304703f (fix: resume orphaned subagent sessions after SIGUSR1 reload): - Regenerate src/config/schema.base.generated.ts (picks up new gateway.reload.deferralTimeoutMs field + help text) - Add FIELD_LABELS entry for gateway.reload.deferralTimeoutMs (schema.help.quality test enforces label parity) - Update infra-runtime timeout test to advance 5m instead of 90s (default raised by the fix) --------- Co-authored-by: Shawn <118158941+kevinWangSheng@users.noreply.github.com> Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com> Co-authored-by: Marcus Widing <widing.marcus@gmail.com> Co-authored-by: Lyle <31182860+LyleLiu666@users.noreply.github.com> Co-authored-by: Vincent Koc <vincentkoc@ieee.org> Co-authored-by: Altay <altay@uinaf.dev> Co-authored-by: Joey Krug <joeykrug@gmail.com> Co-authored-by: Peter Steinberger <peter@steipete.me> Co-authored-by: Hiago Silva <97215740+Huntterxx@users.noreply.github.com>
Summary
Change Type
Scope
Linked Issue/PR
User-visible / Behavior Changes
Security Impact
Repro + Verification
Steps
Before
After
Evidence
pnpm test -- src/plugins/runtime.test.ts src/plugins/http-registry.test.ts src/gateway/server/plugins-http.test.ts src/gateway/config-reload.test.tspnpm tsgopnpm buildpnpm checkHuman Verification