Skip to content

fix(plugins): add timeout protection to before_agent_start hook#48741

Open
Jerry-Xin wants to merge 11 commits intoopenclaw:mainfrom
Jerry-Xin:fix/48534-plugin-hook-timeout
Open

fix(plugins): add timeout protection to before_agent_start hook#48741
Jerry-Xin wants to merge 11 commits intoopenclaw:mainfrom
Jerry-Xin:fix/48534-plugin-hook-timeout

Conversation

@Jerry-Xin
Copy link
Copy Markdown

Summary

Fixes #48534 — A plugin before_agent_start hook that hangs permanently blocks all agent runs with zero diagnostic output.

Root Cause

Plugin lifecycle hooks (before_agent_start, etc.) were executed with Promise.all but without any timeout protection. If a single plugin hook hangs indefinitely, it blocks all agent runs permanently with no diagnostic output to help identify the culprit.

Changes

1. src/plugins/hooks.ts

  • Added withHookTimeout() wrapper that races hook execution against a configurable timeout
  • When timeout is exceeded, logs a warning identifying the hanging plugin by name
  • Agent continues gracefully instead of blocking forever

2. src/plugins/hook-runner-global.ts

  • Updated global hook runner to use the new timeout-protected execution

3. src/plugins/loader.ts

  • Read hookTimeoutMs from plugin config and pass to hook runner

4. src/config/types.plugins.ts + zod-schema.ts + schema.help.ts + schema.labels.ts

  • Added hookTimeoutMs config option (default: 30000ms)

5. src/plugins/hooks.timeout.test.ts

  • Added regression tests for timeout behavior

Change Type

  • Bug fix (non-breaking change which fixes an issue)

Security Impact

Improves reliability — prevents a single misbehaving plugin from permanently blocking the agent runtime.

Human Verification

  • oxlint passes (0 warnings, 0 errors)
  • Tests added for timeout behavior

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 17, 2026

Greptile Summary

This PR adds per-handler timeout protection to all async plugin hook runners, preventing a single hanging plugin from permanently blocking agent runs. The approach — wrapping each call in a Promise.race via the existing withTimeout utility — is sound, and the implementation correctly threads the new hookTimeoutMs option from config all the way through the loader → global runner → createHookRunner.

Key observations:

  • Timeout logic is correct: the three-way ternary (> 0 → use value, === 0 → disable, undefined → default 10 s) handles all cases and is validated upstream by Zod's int().min(0).
  • Leaked background promises: when a timeout fires the underlying hook-handler promise continues executing; the AbortSignal surfaced by withTimeout is not forwarded to the actual handler (() => fn() ignores the signal). For I/O-heavy hooks this means network/DB work keeps running after the agent has already moved on. This is a known JavaScript limitation but worth documenting.
  • Severity mismatch: timed-out handlers are routed through handleHookError, which logs at error level. Since the agent continues gracefully, warn would better reflect the severity and reduce operational noise.
  • The PR description states the default is 30 000 ms, but the code (DEFAULT_HOOK_TIMEOUT_MS = 10_000) and all inline docs consistently say 10 000 ms — small description inaccuracy that doesn't affect functionality.
  • Test coverage is solid, using vi.useFakeTimers across void, modifying, and claiming hook patterns, plus the catchErrors: false throw path.

Confidence Score: 4/5

  • Safe to merge; addresses a real hang bug with no breaking changes. Two non-blocking style improvements (log level and signal forwarding) are worth follow-up but don't block the fix.
  • The core timeout logic is correct and well-tested. The two issues flagged (leaked promises, error vs. warn log level) are style/observability concerns, not correctness bugs. Config schema, type definitions, labels, and help text are all consistent (except the PR description's "30 000 ms" vs code's 10 000 ms). No breaking interface changes.
  • src/plugins/hooks.ts — the callHandlerWithTimeout signal-forwarding gap and the error-vs-warn log severity are worth revisiting in a follow-up.

Comments Outside Diff (1)

  1. src/plugins/hooks.ts, line 244-257 (link)

    P2 Timeout funneled through error-level log instead of warn

    When a hook handler exceeds the timeout, withTimeout rejects with a "timed out" error. That rejection is caught and routed through handleHookError, which always calls logger?.error(msg). The resulting log line reads:

    [hooks] before_agent_start handler from my-plugin failed: Error: before_agent_start handler from my-plugin timed out
    

    A timeout is an expected, gracefully handled condition — the agent continues running normally. Logging it at error level risks unnecessary alert fatigue for operators and obscures real failures. The PR description mentions "logs a warning", which aligns better with the severity. Consider adding a dedicated timeout check before delegating to handleHookError:

    const handleHookError = (params: {
      hookName: PluginHookName;
      pluginId: string;
      error: unknown;
    }): never | void => {
      const isTimeout =
        params.error instanceof Error && params.error.message.includes("timed out");
      const msg = `[hooks] ${params.hookName} handler from ${params.pluginId} ${
        isTimeout ? "timed out and was skipped" : `failed: ${String(params.error)}`
      }`;
      if (catchErrors) {
        isTimeout ? logger?.warn(msg) : logger?.error(msg);
        return;
      }
      throw new Error(msg, { cause: params.error });
    };
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/plugins/hooks.ts
    Line: 244-257
    
    Comment:
    **Timeout funneled through error-level log instead of warn**
    
    When a hook handler exceeds the timeout, `withTimeout` rejects with a `"timed out"` error. That rejection is caught and routed through `handleHookError`, which always calls `logger?.error(msg)`. The resulting log line reads:
    
    ```
    [hooks] before_agent_start handler from my-plugin failed: Error: before_agent_start handler from my-plugin timed out
    ```
    
    A timeout is an expected, gracefully handled condition — the agent continues running normally. Logging it at `error` level risks unnecessary alert fatigue for operators and obscures real failures. The PR description mentions "logs a warning", which aligns better with the severity. Consider adding a dedicated timeout check before delegating to `handleHookError`:
    
    ```ts
    const handleHookError = (params: {
      hookName: PluginHookName;
      pluginId: string;
      error: unknown;
    }): never | void => {
      const isTimeout =
        params.error instanceof Error && params.error.message.includes("timed out");
      const msg = `[hooks] ${params.hookName} handler from ${params.pluginId} ${
        isTimeout ? "timed out and was skipped" : `failed: ${String(params.error)}`
      }`;
      if (catchErrors) {
        isTimeout ? logger?.warn(msg) : logger?.error(msg);
        return;
      }
      throw new Error(msg, { cause: params.error });
    };
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/plugins/hooks.ts
Line: 179-188

Comment:
**Timed-out hook promise is leaked**

When the timeout fires, `withTimeout` races the handler's promise against the abort promise and returns early — but the underlying hook-handler promise (`fn()`) continues executing in the background indefinitely. The `AbortSignal` that `withTimeout` exposes through its `work` callback is never forwarded to the actual hook handler because the wrapper is `() => fn()` (signal ignored).

For hooks doing I/O or significant async work (network calls, file I/O, DB queries) this means that work keeps running and consuming resources long after the agent has already moved on. Consider documenting this behavior, and if the plugin contract ever evolves to accept an `AbortSignal`, wiring it through here would allow cooperative cancellation:

```ts
async function callHandlerWithTimeout<T>(
  fn: (signal?: AbortSignal) => Promise<T>,
  hookName: PluginHookName,
  pluginId: string,
): Promise<T> {
  if (!hookTimeoutMs) {
    return fn();
  }
  return withTimeout((signal) => fn(signal), hookTimeoutMs, `${hookName} handler from ${pluginId}`);
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/plugins/hooks.ts
Line: 244-257

Comment:
**Timeout funneled through error-level log instead of warn**

When a hook handler exceeds the timeout, `withTimeout` rejects with a `"timed out"` error. That rejection is caught and routed through `handleHookError`, which always calls `logger?.error(msg)`. The resulting log line reads:

```
[hooks] before_agent_start handler from my-plugin failed: Error: before_agent_start handler from my-plugin timed out
```

A timeout is an expected, gracefully handled condition — the agent continues running normally. Logging it at `error` level risks unnecessary alert fatigue for operators and obscures real failures. The PR description mentions "logs a warning", which aligns better with the severity. Consider adding a dedicated timeout check before delegating to `handleHookError`:

```ts
const handleHookError = (params: {
  hookName: PluginHookName;
  pluginId: string;
  error: unknown;
}): never | void => {
  const isTimeout =
    params.error instanceof Error && params.error.message.includes("timed out");
  const msg = `[hooks] ${params.hookName} handler from ${params.pluginId} ${
    isTimeout ? "timed out and was skipped" : `failed: ${String(params.error)}`
  }`;
  if (catchErrors) {
    isTimeout ? logger?.warn(msg) : logger?.error(msg);
    return;
  }
  throw new Error(msg, { cause: params.error });
};
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: dc0b997

Comment on lines +179 to +188
async function callHandlerWithTimeout<T>(
fn: () => Promise<T>,
hookName: PluginHookName,
pluginId: string,
): Promise<T> {
if (!hookTimeoutMs) {
return fn();
}
return withTimeout(() => fn(), hookTimeoutMs, `${hookName} handler from ${pluginId}`);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Timed-out hook promise is leaked

When the timeout fires, withTimeout races the handler's promise against the abort promise and returns early — but the underlying hook-handler promise (fn()) continues executing in the background indefinitely. The AbortSignal that withTimeout exposes through its work callback is never forwarded to the actual hook handler because the wrapper is () => fn() (signal ignored).

For hooks doing I/O or significant async work (network calls, file I/O, DB queries) this means that work keeps running and consuming resources long after the agent has already moved on. Consider documenting this behavior, and if the plugin contract ever evolves to accept an AbortSignal, wiring it through here would allow cooperative cancellation:

async function callHandlerWithTimeout<T>(
  fn: (signal?: AbortSignal) => Promise<T>,
  hookName: PluginHookName,
  pluginId: string,
): Promise<T> {
  if (!hookTimeoutMs) {
    return fn();
  }
  return withTimeout((signal) => fn(signal), hookTimeoutMs, `${hookName} handler from ${pluginId}`);
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/plugins/hooks.ts
Line: 179-188

Comment:
**Timed-out hook promise is leaked**

When the timeout fires, `withTimeout` races the handler's promise against the abort promise and returns early — but the underlying hook-handler promise (`fn()`) continues executing in the background indefinitely. The `AbortSignal` that `withTimeout` exposes through its `work` callback is never forwarded to the actual hook handler because the wrapper is `() => fn()` (signal ignored).

For hooks doing I/O or significant async work (network calls, file I/O, DB queries) this means that work keeps running and consuming resources long after the agent has already moved on. Consider documenting this behavior, and if the plugin contract ever evolves to accept an `AbortSignal`, wiring it through here would allow cooperative cancellation:

```ts
async function callHandlerWithTimeout<T>(
  fn: (signal?: AbortSignal) => Promise<T>,
  hookName: PluginHookName,
  pluginId: string,
): Promise<T> {
  if (!hookTimeoutMs) {
    return fn();
  }
  return withTimeout((signal) => fn(signal), hookTimeoutMs, `${hookName} handler from ${pluginId}`);
}
```

How can I resolve this? If you propose a fix, please make it concise.

@Jerry-Xin
Copy link
Copy Markdown
Author

Good point about the leaked promise. Added a documentation comment in commit 6de4b04 explaining the limitation — since the current plugin hook contract (() => Promise<T>) doesn't accept an AbortSignal, cooperative cancellation isn't possible yet. The comment notes the path forward for when the plugin API evolves to support signal-based cancellation.

Changing the handler signature would be a breaking change to the plugin API, so keeping it as documented behavior for now seemed like the right scope for this PR.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 6de4b04 to 3d5b7a4 Compare March 17, 2026 08:08
@Jerry-Xin
Copy link
Copy Markdown
Author

Rebased on latest upstream/main and resolved conflicts in src/plugins/hooks.ts — upstream extracted the claiming-hook loop into runClaimingHooksList, which already uses our callHandlerWithTimeout pattern. Merged cleanly with our timeout infrastructure.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 3d5b7a4 to 8499e58 Compare March 21, 2026 08:13
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8499e58e8b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +190 to +193
if (!hookTimeoutMs) {
return fn();
}
return withTimeout(() => fn(), hookTimeoutMs, `${hookName} handler from ${pluginId}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope the timeout away from fail-open policy hooks

This helper now wraps every async hook in the same timeout, not just before_agent_start. That changes the semantics of gatekeeper hooks: runMessageSending() and runBeforeToolCall() both flow through runModifyingHook(), while their callers in src/infra/outbound/deliver.ts:424-465 and src/agents/pi-tools.before-tool-call.ts:148-190 allow the send/tool call whenever the hook returns nothing or throws. In any deployment where a plugin does outbound moderation, ownership checks, or tool authorization over the network, a >10s check will now silently fail open instead of enforcing the policy.

Useful? React with 👍 / 👎.

Comment on lines +112 to +113
/** Default timeout for async plugin hook handlers (ms). */
const DEFAULT_HOOK_TIMEOUT_MS = 10_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise the default timeout above the memory recall budget

The new default is only 10s, but the bundled memory-lancedb plugin runs its before_agent_start hook by making a live embedding request and then searching LanceDB (extensions/memory-lancedb/index.ts:548-555). OpenClaw’s own memory stack already budgets 60s for remote embedding queries and 5 minutes for local ones (src/memory/manager-embedding-ops.ts:39-40), so this default will spuriously disable auto-recall on cold or slower providers. In those environments the regression is user-visible: the agent starts without recalled memories even though the underlying embedding request is still within the system’s normal timeout budget.

Useful? React with 👍 / 👎.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 8499e58 to b430cf6 Compare March 21, 2026 11:09
@Jerry-Xin
Copy link
Copy Markdown
Author

Addressed both review points in commit b430cf6:

P1 — Scope timeout away from gatekeeper hooks: Added a TIMEOUT_EXEMPT_HOOKS set containing message_sending and before_tool_call. These security/policy hooks now bypass the timeout wrapper entirely, preventing silent fail-open when moderation or tool-blocking checks take longer than the budget. All other hooks (before_agent_start, before_model_resolve, etc.) remain timeout-protected.

P2 — Raise default timeout: Bumped DEFAULT_HOOK_TIMEOUT_MS from 10s to 30s. This gives plugins like memory-lancedb enough headroom for cold-start embedding requests while still catching genuinely hung handlers.

Added tests verifying that message_sending and before_tool_call handlers are never timed out even with a very short hookTimeoutMs.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from b430cf6 to 42dec4e Compare March 22, 2026 00:12
@Jerry-Xin
Copy link
Copy Markdown
Author

Rebased on latest upstream/main and resolved conflict in src/plugins/loader.ts — upstream refactored activatePluginRegistry to accept cached.registry instead of the full cache entry. Merged cleanly with our hookTimeoutMs option parameter. All timeout infrastructure preserved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42dec4e4d3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

};

/** Default timeout for async plugin hook handlers (ms). */
const DEFAULT_HOOK_TIMEOUT_MS = 30_000;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Raise default hook timeout above remote memory recall budget

Fresh evidence in this revision: DEFAULT_HOOK_TIMEOUT_MS is now 30_000, but the bundled memory-lancedb plugin still does a live embedding request in extensions/memory-lancedb/index.ts:548-555, while core embedding queries allow up to 60s remotely in src/memory/manager-embedding-ops.ts:39-40. In deployments using slower remote embedders, a recall that is still within the supported memory budget will now time out here, so the agent starts without recalled memories even though the underlying request has not actually failed.

Useful? React with 👍 / 👎.

Comment on lines +300 to +304
await callHandlerWithTimeout(
() => (hook.handler as (event: unknown, ctx: unknown) => Promise<void>)(event, ctx),
hookName,
hook.pluginId,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exempt agent_end from the blanket async-hook timeout

runVoidHook() now sends every async void hook through callHandlerWithTimeout(), so agent_end inherits the same 30s cap. That breaks the bundled memory-lancedb auto-capture path in extensions/memory-lancedb/index.ts:576-657: it can perform up to three sequential embeddings.embed() calls plus DB searches/stores, while the core embedding budget is 60s remote / 5m local in src/memory/manager-embedding-ops.ts:39-42. On slower providers this change will drop memory capture after a run even though each embedding request is still within the system’s normal timeout.

Useful? React with 👍 / 👎.

@Jerry-Xin
Copy link
Copy Markdown
Author

Addressed both P2 comments in commit 533235d:

P2 — Raise default hook timeout / memory recall budget (comment #2970594620): Added before_agent_start to TIMEOUT_EXEMPT_HOOKS. Memory recall via memory-lancedb does live embedding requests in this hook, and the core embedding budget allows up to 60s remote / 5m local — well above the 30s hook timeout. Exempting this hook ensures memory recall completes on slower remote embedders without being silently dropped.

P2 — Exempt agent_end from timeout (comment #2970594621): Added agent_end to TIMEOUT_EXEMPT_HOOKS for the same reason — the memory auto-capture path performs up to three sequential embed() calls plus DB operations, which legitimately exceed 30s on slower providers. These hooks have their own upstream timeout budgets and should not be subject to the general hook timeout.

Updated the comment above TIMEOUT_EXEMPT_HOOKS to document both categories of exempt hooks: (a) security/policy gates that must not fail-open, and (b) memory-critical hooks with their own upstream timeout budgets. Migrated existing tests from before_agent_start/agent_end to non-exempt hooks (before_prompt_build/message_received) and added tests verifying both newly exempt hooks are not timed out.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 533235d to b57da38 Compare March 22, 2026 22:07
@Jerry-Xin
Copy link
Copy Markdown
Author

Rebased on latest upstream/main and resolved conflict in src/plugins/hook-runner-global.ts — upstream refactored the global state to use resolveGlobalSingleton at module scope instead of getHookRunnerGlobalState(). Preserved our options?: { hookTimeoutMs?: number } parameter addition. All changes intact.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b57da3803d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const TIMEOUT_EXEMPT_HOOKS: ReadonlySet<PluginHookName> = new Set([
"message_sending",
"before_tool_call",
"before_agent_start",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore timeout protection for before_agent_start

Fresh evidence in this revision: adding "before_agent_start" to TIMEOUT_EXEMPT_HOOKS means the original deadlock is still possible on the main agent-start path. Both src/agents/pi-embedded-runner/run.ts:348-353 and src/agents/pi-embedded-runner/run/attempt.ts:1436-1444 await runBeforeAgentStart(), so any plugin hook that never resolves here will still stall the run indefinitely with no timeout or recovery. That leaves the bug this change is meant to fix unresolved for the very hook named in the PR.

Useful? React with 👍 / 👎.

plugins: z
.object({
enabled: z.boolean().optional(),
hookTimeoutMs: z.number().int().min(0).optional(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate the base config schema for hookTimeoutMs

Adding plugins.hookTimeoutMs here is not enough for config-schema consumers, because buildConfigSchema() serves GENERATED_BASE_CONFIG_SCHEMA from src/config/schema.base.generated.ts, and that generated file still has no plugins.hookTimeoutMs entry. In practice the gateway config-schema API and any settings/editor UI built on it will not expose or round-trip this new option even though runtime validation now accepts it.

Useful? React with 👍 / 👎.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from b57da38 to 26c775e Compare March 23, 2026 00:13
@Jerry-Xin
Copy link
Copy Markdown
Author

Addressed both review points in commit 26c775e:

P1 — Restore timeout protection for memory hooks: Removed before_agent_start and agent_end from TIMEOUT_EXEMPT_HOOKS. Instead of full exemption, they now get a dedicated MEMORY_HOOK_TIMEOUT_MS of 120s (2 minutes) — enough headroom for remote embed() calls (typically <60s) while still terminating genuinely hung handlers. This restores the deadlock protection that this PR is meant to provide, without silently dropping legitimate memory operations.

P2 — Regenerate config schema: Ran generate-base-config-schema.ts --write and config:docs:genhookTimeoutMs is now present in src/config/schema.base.generated.ts and the config baseline docs.

@openclaw-barnacle openclaw-barnacle bot added the docs Improvements or additions to documentation label Mar 23, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26c775e61c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if (!hookTimeoutMs || TIMEOUT_EXEMPT_HOOKS.has(hookName)) {
return fn();
}
const effectiveTimeout = MEMORY_HOOK_NAMES.has(hookName) ? MEMORY_HOOK_TIMEOUT_MS : hookTimeoutMs;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor configured timeout for memory hooks

callHandlerWithTimeout() hard-codes before_agent_start and agent_end to 120000 ms, so plugins.hookTimeoutMs cannot raise the cap for those hooks. That breaks the new config knob for deployments using memory-lancedb: recall/capture in extensions/memory-lancedb/index.ts:548-657 can legitimately ride on embedding operations budgeted up to 5 minutes for local providers in src/memory/manager-embedding-ops.ts:39-42. In that environment, even an explicit plugins.hookTimeoutMs: 300000 will still cut memory recall/capture off after 2 minutes.

Useful? React with 👍 / 👎.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 26c775e to 29d22f2 Compare March 23, 2026 02:11
@Jerry-Xin
Copy link
Copy Markdown
Author

Fixed in commit 29d22f2 — memory hooks now use Math.max(MEMORY_HOOK_TIMEOUT_MS, hookTimeoutMs) instead of a hard-coded 120s cap. When plugins.hookTimeoutMs is configured above the 120s floor (e.g. hookTimeoutMs: 300000 for slow local embedding providers), the configured value is honored. The 120s floor is preserved as the minimum for memory hooks when the configured timeout is lower.

Added a test verifying that a 4-minute handler completes successfully when hookTimeoutMs is set to 5 minutes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29d22f2308

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +224 to +226
const effectiveTimeout = MEMORY_HOOK_NAMES.has(hookName)
? Math.max(MEMORY_HOOK_TIMEOUT_MS, hookTimeoutMs)
: hookTimeoutMs;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exempt Discord subagent binding hooks from the 30s timeout

subagent_spawning now flows through callHandlerWithTimeout(), so the generic 30s cap applies to channel thread-binding hooks too. The built-in Discord hook in extensions/discord/src/subagent-hooks.ts:67-84 awaits autoBindSpawnedDiscordSubagent(), which can make several REST calls in extensions/discord/src/monitor/thread-bindings.lifecycle.ts:173-223 and extensions/discord/src/monitor/thread-bindings.discord-api.ts:240-297 without any tighter timeout. In environments where those Discord API calls take longer than 30s, src/agents/subagent-spawn.ts:257-283 now reports thread=true as unavailable even though the bind may still finish later, so threaded subagent spawns regress on slow or unhealthy Discord links.

Useful? React with 👍 / 👎.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 29d22f2 to 57ec97a Compare March 23, 2026 04:10
@Jerry-Xin
Copy link
Copy Markdown
Author

Good catch — added subagent_spawning to the elevated timeout set in commit 57ec97a. Channel thread-binding hooks (e.g. Discord) make several REST calls that can legitimately exceed the generic 30s cap. They now get at least 120s (or the configured hookTimeoutMs, whichever is higher), consistent with how memory hooks are handled.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 57ec97a to ffeac52 Compare March 23, 2026 14:09
@Jerry-Xin
Copy link
Copy Markdown
Author

Rebased on latest upstream/main and resolved conflict in docs/.generated/config-baseline.jsonl (generated file meta header). All changes preserved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cde5f14e77

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +483 to +486
"./plugin-sdk/ollama": {
"types": "./dist/plugin-sdk/ollama.d.ts",
"default": "./dist/plugin-sdk/ollama.js"
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restore exported ollama-setup SDK subpath

Adding ./plugin-sdk/ollama here without keeping the previously exported ./plugin-sdk/ollama-setup subpath creates a runtime compatibility break: extensions built against openclaw/plugin-sdk/ollama-setup will now throw ERR_PACKAGE_PATH_NOT_EXPORTED when loaded. Since this commit is otherwise a hook-timeout bugfix, dropping an existing package export is an unintended breaking API regression; keep the old subpath as a compatibility alias while introducing the new one.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — this is a false positive. The ./plugin-sdk/ollama-setup subpath does not exist in upstream/main and never has:

$ git show upstream/main:package.json | grep ollama-setup
(no matches)

Upstream only has ./plugin-sdk/ollama at line 483, and our branch's package.json is identical to upstream (zero diff). No subpath was dropped by this PR.

@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from cde5f14 to 2a34e3b Compare March 27, 2026 10:13
@openclaw-barnacle openclaw-barnacle bot added size: L and removed channel: matrix Channel integration: matrix channel: mattermost Channel integration: mattermost channel: voice-call Channel integration: voice-call app: web-ui App: web-ui gateway Gateway runtime extensions: memory-core Extension: memory-core cli CLI command changes scripts Repository scripts commands Command implementations agents Agent runtime and tooling channel: feishu Channel integration: feishu size: XL labels Mar 27, 2026
@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 2a34e3b to 4cb88ca Compare March 27, 2026 14:12
Jerry-Xin and others added 11 commits March 29, 2026 14:17
Add configurable timeout (default 30s) to plugin lifecycle hooks to prevent
a single misbehaving plugin from permanently blocking all agent runs.

- Add hookTimeoutMs config option under plugins section
- Wrap hook execution in Promise.race with timeout
- Log warning with plugin name when hook exceeds timeout
- Agent continues gracefully after timeout instead of hanging

Fixes openclaw#48534
Add a code comment explaining that timed-out handler promises continue
running in the background since the current plugin hook API does not
accept an AbortSignal for cooperative cancellation. Notes the path
forward for when the plugin contract evolves.
…t for memory hooks

Remove before_agent_start and agent_end from TIMEOUT_EXEMPT_HOOKS so
hung handlers can no longer stall runs indefinitely.  Instead, give them
a dedicated MEMORY_HOOK_TIMEOUT_MS of 120 s — enough headroom for remote
embed() calls while still protecting against genuinely stuck handlers.

Regenerate base config schema and config baseline for hookTimeoutMs.
Use Math.max(MEMORY_HOOK_TIMEOUT_MS, hookTimeoutMs) for memory-critical
hooks so that plugins.hookTimeoutMs can raise the cap above the 120s
floor. Previously, memory hooks were hard-capped at 120s regardless of
the configured value, preventing deployments with slow local embedding
providers from increasing the budget via config.

Fixes the case where hookTimeoutMs: 300000 (5 min) would still cut off
memory recall/capture at 2 minutes.
Channel thread-binding hooks (e.g. Discord) make several REST calls
during subagent_spawning that can legitimately exceed the generic 30s
timeout. Add subagent_spawning to the elevated timeout set so it gets
at least 120s (or the configured hookTimeoutMs, whichever is higher),
preventing premature timeout failures for threaded subagent spawns on
slow or congested Discord links.
Values above 2,147,483,647 overflow Node's setTimeout to ~1ms,
silently inverting behavior. Add .max(2_147_483_647) to the Zod
schema so invalid values are caught at config validation time.
hasExplicitPluginConfig did not consider plugins.hookTimeoutMs, so in
VITEST runs applyTestPluginDefaults would classify configs like
{ plugins: { hookTimeoutMs: 5000 } } as non-explicit and force
plugins.enabled = false. Recognize hookTimeoutMs as explicit plugin
configuration to prevent test/prod behavior drift.
@Jerry-Xin Jerry-Xin force-pushed the fix/48534-plugin-hook-timeout branch from 4cb88ca to 36233b7 Compare March 29, 2026 06:17
@Jerry-Xin
Copy link
Copy Markdown
Author

Rebased on latest upstream/main and resolved conflicts in src/plugins/loader.ts (upstream added runtimeSubagentMode parameter — preserved alongside our hookTimeoutMs options). All changes preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Improvements or additions to documentation size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Plugin before_agent_start hook hang permanently blocks all agent runs with zero diagnostic output

2 participants