Skip to content

feat(hooks): add context_assembled, loop_iteration_start/end observability hooks [claude, human developer oversight]#33915

Closed
zeroaltitude wants to merge 37 commits intoopenclaw:mainfrom
zeroaltitude:feat/hook-context-assembled-loop-iteration
Closed

feat(hooks): add context_assembled, loop_iteration_start/end observability hooks [claude, human developer oversight]#33915
zeroaltitude wants to merge 37 commits intoopenclaw:mainfrom
zeroaltitude:feat/hook-context-assembled-loop-iteration

Conversation

@zeroaltitude
Copy link
Copy Markdown
Contributor

@zeroaltitude zeroaltitude commented Mar 4, 2026

Summary

Add three observability-only plugin hooks around the embedded agent loop so plugins can measure:

  • what context is about to be sent to the model (context_assembled)
  • per-iteration lifecycle (loop_iteration_start / loop_iteration_end)

These hooks are void + fire-and-forget (parallel execution, errors logged), so they don't affect control flow or the critical path.

Review focus: src/plugins/types.ts (hook names + event types) and src/agents/pi-embedded-runner/run/attempt.ts (emission sites).

New hooks

Hook Fires Primary use
context_assembled Once per run, after system prompt + effective user prompt + messages are assembled, before first LLM call Context census, token budgeting, audit/forensics
loop_iteration_start At each agent loop turn_start Recursion/iteration depth, per-turn monitoring
loop_iteration_end At each turn_end Tool-call volume, loop termination observability

Event payloads (final)

// context_assembled (fires once; iteration is always 1)
{
  systemPrompt: string;
  prompt: string;              // effective prompt (after hook modifications)
  messages: AgentMessage[];    // snapshot to avoid mutation during async handlers
  messageCount: number;
  imageCount: number;
  iteration: 1;
}

// loop_iteration_start
{
  iteration: number;
  messageCount: number;
  pendingToolResults?: number; // tool results produced by the previous turn
}

// loop_iteration_end
{
  iteration: number;
  toolCallsMade: number;
  newMessagesAdded?: number;   // messageCount delta during this turn
  hasToolResults: boolean;     // informative signal only (not definitive loop terminal)
}

Implementation notes

  • src/plugins/types.ts: adds hook names + event types + handler map entries
  • src/plugins/hooks.ts: adds runContextAssembled, runLoopIterationStart, runLoopIterationEnd (void hooks)
  • src/agents/pi-embedded-runner/run/attempt.ts:
    • Subscribes to turn_start / turn_end to emit iteration hooks
    • Emits context_assembled once before the first LLM call
    • Unsubscribes in cleanup

Tests

src/plugins/hooks.context-loop.test.ts: verifies event shape, parallel execution, and no-op behavior when no handlers are registered.

Compatibility / risk

Additive only. No behavior change unless a plugin registers these hooks; hook errors are swallowed/logged.

AI-Assisted Development 🤖

This PR was developed with AI assistance (Claude/OpenClaw agent "Tank").

  • Degree of testing: Well-tested — unit tests covering event shape, parallel execution, no-op. pnpm check clean.
  • Human oversight: All changes reviewed and directed by Eddie Abrams
  • Understanding confirmed: The contributor understands the embedded agent loop lifecycle and void hook execution model

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 4, 2026

Greptile Summary

This PR adds three void, fire-and-forget observability hooks (context_assembled, loop_iteration_start, loop_iteration_end) around the embedded agent loop, enabling plugins to measure context size, token budget, and per-iteration lifecycle without any impact on control flow. The implementation is additive-only and well-structured.\n\nAfter several review cycles, all previously flagged issues have been addressed: the TDZ / forward-reference concern is fixed by moving hookCtx construction early, willContinue is correctly renamed to hasToolResults, the imageCount field and runId are present in type definitions and test fixtures, newMessagesAdded and pendingToolResults are now populated by the implementation, the hookEventUnsub cleanup is wrapped in a try/catch, and the deep-copy JSDoc is consistent with the structuredClone call.\n\nRemaining issue:\n- The PR moves hookCtx construction out of the try block (line 2632) to fix the forward reference, but does not remove the pre-existing inner hookCtx declaration at line ~2815. The two declarations are currently identical, but the shadowing creates a maintenance hazard if one is updated without the other.

Confidence Score: 4/5

Safe to merge after removing the redundant inner hookCtx declaration; all major concerns from prior cycles are resolved.

The PR has converged well through multiple review iterations. All critical and P1 issues are addressed. The one remaining issue — a duplicate hookCtx declaration that shadows the outer one — is a P2 style concern with no runtime impact today. A targeted fix (removing the inner declaration) would bring this to 5/5.

src/agents/pi-embedded-runner/run/attempt.ts — the inner hookCtx declaration at line 2815 should be removed to eliminate shadowing of the outer one added at line 2632.

Important Files Changed

Filename Overview
src/agents/pi-embedded-runner/run/attempt.ts Adds context_assembled emission and turn_start/turn_end subscription for iteration hooks. The new outer hookCtx at line 2632 was not accompanied by removal of the pre-existing inner hookCtx at line 2815, leaving two identical declarations in overlapping scopes.
src/plugins/types.ts Adds PluginHookContextAssembledEvent, PluginHookLoopIterationStartEvent, PluginHookLoopIterationEndEvent types and handler map entries. JSDoc is accurate and consistent with the implementation after prior review iterations.
src/plugins/hooks.ts Adds runContextAssembled, runLoopIterationStart, runLoopIterationEnd void hook runners delegating to runVoidHook. Clean and consistent with existing patterns.
src/plugins/hooks.context-loop.test.ts Unit tests covering event shape, parallel execution, and no-op behavior for all three new hooks. Fixtures include required fields and correct values.
src/agents/pi-embedded-runner/run/types.ts Adds attemptIndex field to EmbeddedRunAttemptParams, threaded through from the outer run loop for use in hook payloads.
src/agents/pi-embedded-runner/run.ts Single-line change passing attemptIndex (runLoopIterations - 1) to the attempt call. Correct zero-based index computation.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2815-2823

Comment:
**Redundant `hookCtx` declaration shadows the outer one**

This PR adds a new `hookCtx` at line 2632 (outside the `try` block) to fix the forward-reference issue for the subscription callback — but the original inner `hookCtx` here was not removed. There are now two identical `hookCtx` objects in scope: the outer one used by the subscription and this inner one used by all subsequent hook calls inside the `try` block.

Because they're built from the same `params` values, the duplication is harmless today. But if one is updated (e.g. to add a new field) without updating the other, the iteration hooks and the prompt/LLM hooks will silently diverge. ESLint's `no-shadow` rule would flag this.

The fix is to remove this inner declaration and reference the outer `hookCtx` (line 2632) throughout the `try` block.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (22): Last reviewed commit: "Merge origin/main into feat/hook-context..." | Re-trigger Greptile

Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3a2840ff81

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
@zeroaltitude zeroaltitude changed the title feat(hooks): add context_assembled, loop_iteration_start/end observability hooks feat(hooks): add context_assembled, loop_iteration_start/end observability hooks [claude, human developer oversight] Mar 4, 2026
zeroaltitude added a commit to zeroaltitude/openclaw that referenced this pull request Mar 4, 2026
…e as approximate

- context_assembled now includes effectivePrompt so plugins see the
  full model input (system prompt + user prompt + messages)
- willContinue documented as heuristic (doesn't account for abort/timeout)

Addresses greptile + codex-connector review on PR openclaw#33915.
zeroaltitude added a commit to zeroaltitude/openclaw that referenced this pull request Mar 4, 2026
…separate PR)

hook-stream-wrapper.ts incorrectly referenced context_assembled which
is defined in feat/hook-context-assembled-loop-iteration (openclaw#33915), not
this branch. Removed cross-branch dependency so CI passes standalone.

Fixes type errors in CI check job.
@zeroaltitude
Copy link
Copy Markdown
Contributor Author

All review items addressed:

  • willContinue heuristic (greptile) — Intentionally approximate; documented on the type. turn_end event only exposes toolResults; true continuation decision lives in the loop controller. llm_output hook is the reliable terminal signal.
  • prompt in context_assembled (codex P2) — Already implemented; effectivePrompt is passed as prompt in the event payload. Tested in hooks.context-loop.test.ts.
  • label CI — GitHub API rate limit flake, not a code issue.

9 tests passing. No code changes needed for this round.

zeroaltitude added a commit to zeroaltitude/openclaw that referenced this pull request Mar 4, 2026
The previous name implied the loop would continue, but the field only
reflects whether tool results were present — it cannot account for
abort signals, timeouts, or max-iteration limits evaluated downstream.

Rename to hasToolResults for accuracy. Strengthened JSDoc to direct
plugin authors to llm_output for definitive terminal detection.

Addresses greptile review on PR openclaw#33915.
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M labels Mar 4, 2026
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e5ea615c0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
zeroaltitude added a commit to zeroaltitude/openclaw that referenced this pull request Mar 4, 2026
…led single-fire

- Move hookCtx declaration before the subscription callback that
  references it, eliminating the forward-reference fragility flagged
  by greptile (was safe due to async events, now safe unconditionally)
- Document that context_assembled fires once per run (iteration: 1)
  and direct plugin authors to loop_iteration_start for per-turn tracking
- Updated type-level JSDoc on PluginHookContextAssembledEvent

Addresses remaining greptile 4/5 → 5/5 concerns on PR openclaw#33915.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b591dab35

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Comment thread src/plugins/hooks.context-loop.test.ts
Comment thread src/plugins/hooks.context-loop.test.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
Comment thread src/agents/pi-embedded-runner/run/attempt.ts Outdated
@zeroaltitude
Copy link
Copy Markdown
Contributor Author

@greptile-apps: Are there other issues that prevent you from giving this a 5/5? I would like to address any shortcomings if possible.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 5, 2026

Looking at the review threads, there are still 4 unresolved issues that would prevent a 5/5:

  1. hookCtx temporal dead zone (attempt.ts:1303) — subscription callback references hookCtx ~90 lines before it's declared. Move the subscription setup after line 1386.

  2. context_assembled fires only once (attempt.ts:1515) — hardcoded iteration: 1 but the hook never fires on subsequent turns. Either document this as first-call-only or move it into the turn loop.

  3. No message snapshot (attempt.ts, P2) — context_assembled passes activeSession.messages by reference, so async handlers can see mutations. Use .slice() when constructing the event.

  4. Variable shadowing (attempt.ts:1505) — inner messagesSnapshot shadows the outer function-scoped variable. Rename to contextMessagesSnapshot.

The other threads show "Attempted fix!" or "Fixed!" responses, but those haven't been marked resolved yet. If you've pushed commits addressing those, let me know and I can verify.

@zeroaltitude
Copy link
Copy Markdown
Contributor Author

All 11 review threads resolved. Here's the status of each issue raised:

Fixed in prior commits:

  1. hookCtx temporal dead zone — Moved hookCtx declaration before subscription setup (83ee19f32). Now at line 1275, subscription at 1290.
  2. context_assembled fires only once — By design. JSDoc documents iteration: 1 as "Always 1 — this hook fires once per run before the first LLM call." Per-turn tracking uses loop_iteration_start/end.
  3. No message snapshotactiveSession.messages.slice() added (b8f54b4cf). Variable named contextMessagesSnapshot.
  4. Variable shadowing — Renamed to contextMessagesSnapshot (26c27a213).
  5. Missing imageCount — Added to test fixture (068b113fc).
  6. iteration: 0 → 1 — Fixed in test fixture and assertion (eba041fec).
  7. willContinue renamed — Now hasToolResults with JSDoc documenting it as approximate.
  8. pendingToolResults/newMessagesAdded — Implemented (1525c49fd, b8f54b4cf).
  9. hookEventUnsub try/catch — Wrapped with CRITICAL-level logging (c83d084fd).
  10. Include current promptprompt field added to context_assembled event.
  11. Add imageCount to type — Added to PluginHookContextAssembledEvent.

All fixes verified against current HEAD (c83d084fd). pnpm check clean.

I tidied up the resolved issues @greptile-apps -- please ensure the fixes are sound and re-score.

zeroaltitude added a commit to zeroaltitude/openclaw that referenced this pull request Mar 5, 2026
…memory changes

- Delete hooks.llm-response.test.ts (328 lines) — complete duplicate of
  hooks.extended.test.ts which has 2 extra after_llm_call tests
- Remove hooks.context-loop.test.ts (160 lines) — belongs to PR openclaw#33915
  (feat/hook-context-assembled-loop-iteration), not this PR
- Revert session-memory handler changes (58 lines) — blockSessionSave
  and sessionSaveRedirectPath are a separate feature; will move to own PR

Net: -512 lines, 49 tests remain (zero coverage loss)
@zeroaltitude
Copy link
Copy Markdown
Contributor Author

CI/secrets failing due to pnpm-audit-prod on transitive deps (@buape/carbon -> @hono/node-server -> hono); PR does not modify package.json/lockfile. Happy to send a separate dependency bump/override PR if that’s the preferred fix

@bmendonca3
Copy link
Copy Markdown
Contributor

I think context_assembled currently has “once per attempt” semantics rather than “once per run” semantics.

Right now it fires inside runEmbeddedAttempt() before the first LLM call. But the outer embedded runner can re-enter runEmbeddedAttempt() for the same user-visible run on things like overflow-compaction retry / retry loops, and this hook will emit again with iteration: 1 each time.

So a plugin doing audit/census based on the current docs can double-count a single run and will see multiple context_assembled events that all claim to be the first one.

I think this needs one of:

  • move the emission to the outer runner so it is truly once per run, or
  • document it as once per attempt and stop claiming iteration is always the single run-level first context event

As written, the implementation and the contract look slightly out of sync.

@zeroaltitude
Copy link
Copy Markdown
Contributor Author

Good catch — fixed in 2cab5b599.

The outer runner (run.ts) can re-enter runEmbeddedAttempt for the same user-visible run on overflow compaction retries, auth retries, etc. Each re-entry was firing context_assembled with iteration: 1, so plugins doing audit/census would double-count.

Fix: Added attemptIndex param (zero-based, from the outer runLoopIterations counter) and guarded context_assembled to only fire when attemptIndex === 0. Retries skip the hook entirely.

This preserves the "once per run" contract documented in the hook — retries are internal implementation detail that plugins shouldn't see.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7e82c35cb3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +2599 to +2603
runId: params.runId,
attemptIndex: params.attemptIndex,
iteration: hookTurnIteration,
messageCount: activeSession.messages.length,
pendingToolResults: hookPendingToolResults,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Seed pendingToolResults when retries resume from toolResult history

pendingToolResults is initialized to 0 here and only updated after a prior turn_end, so the first loop_iteration_start of every new attempt reports 0. That breaks on the built-in tool-result truncation recovery path in src/agents/pi-embedded-runner/run.ts:1227-1264: we rewrite oversized toolResult entries in place and immediately continue, so the retried runEmbeddedAttempt starts with those tool results already at the tail of session history and the first model turn is about to consume them. Plugins using this hook for retry telemetry or token accounting will undercount that first retried turn unless this counter is seeded from the current session state before subscribing.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case — on retries resuming from an existing tool-result tail in session history, hookPendingToolResults starts at 0 and underreports the true pending count until the first new tool_use/toolResult pair. Filed as a seeding follow-up: compute pending count from the session message tail on attempt startup.

@zeroaltitude zeroaltitude force-pushed the feat/hook-context-assembled-loop-iteration branch from 7e82c35 to a16dd96 Compare March 26, 2026 06:42
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 761fd47c54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

attemptIndex: params.attemptIndex,
iteration: hookTurnIteration,
messageCount: activeSession.messages.length,
pendingToolResults: hookPendingToolResults,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Seed pendingToolResults from session state on retries

pendingToolResults is initialized to 0 for every runEmbeddedAttempt, so the first loop_iteration_start event in a retried attempt always reports zero even when the session already ends with toolResult messages. On the overflow-recovery path in run.ts, truncateOversizedToolResultsInSession(...) mutates existing tool results and immediately continues into a new attempt, so that first turn is actually consuming pending tool outputs. This under-reports retry iterations for telemetry/token-budget plugins and makes cross-attempt loop metrics inconsistent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case — on retries resuming from an existing tool-result tail in session history, hookPendingToolResults starts at 0 and underreports the true pending count until the first new tool_use/toolResult pair. Filed as a seeding follow-up: compute pending count from the session message tail on attempt startup.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0c0174280

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/plugins/types.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa701f95c7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Subscribe for loop iteration tracking hooks
let hookTurnIteration = 0;
let hookTurnMessageCount = 0;
let hookPendingToolResults = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Seed pending tool-results before emitting first loop start

loop_iteration_start always reports pendingToolResults: 0 on the first turn of a new attempt because the counter is initialized to zero and only updated after a prior turn_end. That is incorrect on retry paths that resume from existing toolResult history (for example after truncateOversizedToolResultsInSession(...) in src/agents/pi-embedded-runner/run.ts:874-912, which mutates session state and immediately continues). In those retries, the first model turn is already consuming pending tool results, so plugins using this hook for telemetry/token accounting undercount the first retried iteration.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case — on retries resuming from an existing tool-result tail in session history, hookPendingToolResults starts at 0 and underreports the true pending count until the first new tool_use/toolResult pair. Filed as a seeding follow-up: compute pending count from the session message tail on attempt startup.

@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Apr 13, 2026
Upstream moved hook types to hook-types.ts. Added context_assembled,
loop_iteration_start, loop_iteration_end to new location.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ed4a6158c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Plugins needing run-level deduplication should key on event.runId.
// Use attemptIndex to distinguish initial (0) from retries (1+).
// Use loop_iteration_start for per-turn tracking within an attempt.
if (hookRunner?.hasHooks("context_assembled")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit context_assembled only when prompt will be submitted

context_assembled is emitted before the preflight compaction/truncation guard runs, so attempts that set skipPromptSubmission = true (for example when shouldPreemptivelyCompactBeforePrompt(...) routes to early recovery) still publish a full context snapshot even though no activeSession.prompt(...) call occurs in that attempt. This makes observability plugins over-count model-input attempts and can record context that was never actually sent, which breaks token/accounting and audit correlation on overflow-recovery paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — emitting context_assembled before the preflight/block check means plugins see context for a prompt that never actually submits (e.g. before_llm_call block). Filed as a follow-up to gate the emission on post-preflight success. For now, plugins observing context should tolerate occasional no-op events (the hook is observational, not gating).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1ed4a6158c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// Subscribe for loop iteration tracking hooks
let hookTurnIteration = 0;
let hookTurnMessageCount = 0;
let hookPendingToolResults = 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Seed pendingToolResults from existing session tail

hookPendingToolResults is initialized to 0 and the first loop_iteration_start event publishes that value before any turn_end can update it. On retry paths that resume with trailing tool-result history (for example after truncateOversizedToolResultsInSession(...) in src/agents/pi-embedded-runner/run.ts and immediate continue), the first turn is already consuming pending tool outputs, so this hook under-reports retry iterations for telemetry/token accounting. Seed the counter from the current session state before subscribing so first-iteration metrics are correct.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid edge case — on retries resuming from an existing tool-result tail in session history, hookPendingToolResults starts at 0 and underreports the true pending count until the first new tool_use/toolResult pair. Filed as a seeding follow-up: compute pending count from the session message tail on attempt startup.

// Plugins needing run-level deduplication should key on event.runId.
// Use attemptIndex to distinguish initial (0) from retries (1+).
// Use loop_iteration_start for per-turn tracking within an attempt.
if (hookRunner?.hasHooks("context_assembled")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Emit context_assembled only when prompt submission proceeds

context_assembled is fired unconditionally before the preflight overflow checks, but later in the same attempt preemptiveCompaction can set skipPromptSubmission = true, in which case activeSession.prompt(...) is skipped. That means overflow-precheck retries can emit a full context census for requests that are never sent to the model, leading audit/token-budget plugins to double-count across retries. Move this emission to just before actual prompt submission (or include an explicit flag that no provider call occurred).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — emitting context_assembled before the preflight/block check means plugins see context for a prompt that never actually submits (e.g. before_llm_call block). Filed as a follow-up to gate the emission on post-preflight success. For now, plugins observing context should tolerate occasional no-op events (the hook is observational, not gating).

@openclaw-barnacle openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Apr 15, 2026
@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 22, 2026

🔒 Aisle Security Analysis

We found 4 potential security issue(s) in this PR:

# Severity Title
1 🟠 High Sensitive data exposure to plugins via new context_assembled hook (full prompts, message history, and raw image data)
2 🟡 Medium Unbounded concurrent plugin hook execution on loop events (resource exhaustion/DoS)
3 🟡 Medium Denial-of-service risk from unbounded structuredClone of session messages/images in hook event
4 🟡 Medium Sensitive data exposure to plugins via new context_assembled / loop hooks (workspace path, session identifiers, full prompt + images)
1. 🟠 Sensitive data exposure to plugins via new `context_assembled` hook (full prompts, message history, and raw image data)
Property Value
Severity High
CWE CWE-201
Location src/agents/pi-embedded-runner/run/attempt.ts:2168-2185

Description

The new context_assembled plugin hook passes the full assembled LLM context to all registered plugin handlers before the first model call.

This includes:

  • systemPrompt and the effective user prompt
  • a deep-cloned snapshot of activeSession.messages (may contain sensitive tool outputs, file contents, credentials, etc.)
  • images objects including data strings (likely base64 content of screenshots / files)

Because plugins are loaded and executed as normal Node.js modules (e.g., via Jiti/module loading in src/plugins/loader.ts), a third-party or user-installed plugin can read this data and exfiltrate it (network/file system), creating a material secret/PII leakage risk.

Vulnerable code:

const contextMessagesSnapshot = structuredClone(activeSession.messages);
hookRunner.runContextAssembled({
  systemPrompt: systemPromptText,
  prompt: effectivePrompt,
  messages: contextMessagesSnapshot,
  images: imageResult.images.length > 0 ? structuredClone(imageResult.images) : [],
  ...
}, hookCtx)

Recommendation

Treat this hook as highly sensitive and add explicit controls before exposing full context to extension code:

  1. Minimize by default: only send safe metadata (counts, model/provider IDs) unless a plugin explicitly requests full content.
  2. Add a permission/allowlist gate per plugin (e.g., requires: ["context_read"]) and require explicit user consent.
  3. Redact/sanitize before invoking handlers:
    • strip/replace known secret patterns
    • remove tool outputs or mark them as [REDACTED]
    • omit images[].data by default (provide hashes/metadata unless explicitly allowed)
  4. Consider running third-party plugins in a sandbox (no network/filesystem) if they can be obtained from untrusted sources.

Example: omit image data and redact messages by default:

const safeImages = imageResult.images.map(({ type, mimeType }) => ({ type, mimeType }));
const safeMessages = redactAgentMessages(activeSession.messages);

await hookRunner.runContextAssembled({
  runId: params.runId,
  systemPrompt: systemPromptText,
  prompt: effectivePrompt,
  messages: safeMessages,
  messageCount: safeMessages.length,
  imageCount: safeImages.length,
  images: safeImages,
  attemptIndex: params.attemptIndex,
}, hookCtx);
2. 🟡 Unbounded concurrent plugin hook execution on loop events (resource exhaustion/DoS)
Property Value
Severity Medium
CWE CWE-400
Location src/plugins/hooks.ts:323-348

Description

runEmbeddedAttempt subscribes to per-turn session events and triggers loop_iteration_start/loop_iteration_end hooks using a fire-and-forget pattern (promise not awaited). The hook runner (runVoidHook) executes all hook handlers in parallel via Promise.all with no concurrency limit, queueing, cancellation, or timeout.

This creates a denial-of-service risk:

  • Input/trigger: activeSession.subscribe emits turn_start/turn_end events potentially many times in an agent loop (e.g., tool-driven loops).
  • Fan-out: Each event spawns a new runVoidHook(...) call, which spawns one promise per registered plugin handler.
  • No backpressure: Because the caller does not await, slow/malicious hooks can accumulate, leading to unbounded concurrent work and memory pressure.
  • Amplifier: Plugin ecosystems allow third-party hooks; a hook can intentionally block (e.g., long network call) and pile up.

Vulnerable code paths:

  • Event subscription triggering hooks without awaiting them.
  • runVoidHook parallelizes handlers and awaits them internally, but callers can ignore the returned promise.

Vulnerable code:

// attempt.ts (subscription callback)
hookRunner
  .runLoopIterationStart({...}, hookCtx)
  .catch((err) => log.warn(...));

hookRunner
  .runLoopIterationEnd({...}, hookCtx)
  .catch((err) => log.warn(...));
// hooks.ts
const promises = hooks.map(async (hook) => {
  await hook.handler(event, ctx);
});
await Promise.all(promises);

Recommendation

Add backpressure and bounded concurrency for void hooks that may be triggered frequently (especially loop hooks). Options:

  1. Per-hook semaphore / queue with a max concurrency (and optionally drop/merge events when overloaded).
  2. Timeouts for hook handlers and cancellation on abort.
  3. In the caller, serialize loop hooks by awaiting, or explicitly coalesce events.

Example: limit concurrent executions per hook name:

// hooks.ts (sketch)
import pLimit from "p-limit";

const voidHookLimiters = new Map<string, ReturnType<typeof pLimit>>();
function limiterFor(name: string) {
  let l = voidHookLimiters.get(name);
  if (!l) {
    l = pLimit(4); // tune per hook
    voidHookLimiters.set(name, l);
  }
  return l;
}

async function runVoidHook<K extends PluginHookName>(hookName: K, event: any, ctx: any) {
  const hooks = getHooksForName(registry, hookName);
  if (hooks.length === 0) return;

  const limit = limiterFor(hookName);
  await Promise.all(
    hooks.map((hook) =>
      limit(async () => {// optional: wrap with timeout + abort signal
        await hook.handler!(event, ctx);
      }),
    ),
  );
}

Additionally, pass an AbortSignal in ctx (or a separate cancellation mechanism) so hooks can stop when an attempt/run aborts.

3. 🟡 Denial-of-service risk from unbounded structuredClone of session messages/images in hook event
Property Value
Severity Medium
CWE CWE-400
Location src/agents/pi-embedded-runner/run/attempt.ts:2168-2185

Description

The new context_assembled hook deep-clones the entire assembled context using structuredClone(...) before the first LLM call.

  • Input size is attacker-influenced: activeSession.messages and prompt-referenced images ultimately derive from user prompts / conversation history.
  • No bounding before cloning: there is no explicit cap on activeSession.messages.length/total message text bytes at the clone site, and detectAndLoadPromptImages() has no maximum reference count; it can load many images (each up to 6MB) into memory.
  • Amplification: structuredClone duplicates the full in-memory representation, potentially doubling memory and adding significant CPU per attempt. Because a single run may trigger multiple attempts (retry/compaction/auth refresh), this can be repeatedly triggered.

Vulnerable code:

const contextMessagesSnapshot = structuredClone(activeSession.messages);
...
images: imageResult.images.length > 0 ? structuredClone(imageResult.images) : [],

This can enable resource exhaustion (high memory / CPU), potentially crashing the process or degrading service availability when large sessions/prompts or many images are processed and the hook is enabled.

Recommendation

Avoid deep-cloning unbounded data for hook observability.

Mitigations (choose one or combine):

  1. Bound the hook payload (recommended): send a summarized/redacted snapshot rather than full objects.
const MAX_HOOK_MESSAGES = 200;
const messagesForHook = activeSession.messages.slice(-MAX_HOOK_MESSAGES);// For images, send metadata only or a capped subset.
const imagesForHook = imageResult.images.map(({ type, mimeType, data }) => ({
  type,
  mimeType,
  bytes: Math.floor((data.length * 3) / 4),
}));

hookRunner.runContextAssembled({
  ...,
  messages: messagesForHook,
  messageCount: activeSession.messages.length,
  images: imagesForHook,
  imageCount: imageResult.images.length,
}, hookCtx);
  1. Make cloning optional/defensive: wrap structuredClone in try/catch and fall back to a shallow copy or omit large fields if cloning fails.

  2. Enforce image count/total bytes limits in detectAndLoadPromptImages() (e.g., max refs, max total bytes) so hook payload and prompt assembly cannot grow without bound.

These changes reduce worst-case memory/CPU overhead and prevent attackers from turning hook invocation into a resource exhaustion vector.

4. 🟡 Sensitive data exposure to plugins via new `context_assembled` / loop hooks (workspace path, session identifiers, full prompt + images)
Property Value
Severity Medium
CWE CWE-200
Location src/agents/pi-embedded-runner/run/attempt.ts:1747-1755

Description

The embedded runner now passes additional sensitive runtime context to all registered plugin hook handlers via hookCtx and the new context_assembled event.

Impacted data newly exposed through hook invocations includes:

  • Local filesystem path: workspaceDir (can reveal host directory layout and allow targeted access if plugins have any filesystem capabilities)
  • Correlation identifiers: sessionKey / sessionId / channelId (can be used for cross-run tracking or joining data sets)
  • Full model input context: systemPrompt, prompt, complete message objects, and raw image payloads (base64 data), enabling silent exfiltration of potentially sensitive user content.

Vulnerable code (exposure point):

const hookCtx = {
  agentId: hookAgentId,
  sessionKey: params.sessionKey,
  sessionId: params.sessionId,
  workspaceDir: params.workspaceDir,
  messageProvider: params.messageProvider ?? undefined,
  trigger: params.trigger,
  channelId: params.messageChannel ?? params.messageProvider ?? undefined,
};
...
hookRunner.runContextAssembled({
  runId: params.runId,
  systemPrompt: systemPromptText,
  prompt: effectivePrompt,
  messages: contextMessagesSnapshot,
  ...,
  images: imageResult.images.length > 0 ? structuredClone(imageResult.images) : [],
}, hookCtx);

If plugins can be installed from third parties or are otherwise not fully trusted, this is an information disclosure channel that can leak user prompts, conversation history, and images, as well as local path/identifier metadata, without explicit user consent or a permission model.

Recommendation

Introduce a permission/allowlist model (or explicit user consent) for hooks that receive sensitive content/metadata, and minimize what is exposed by default.

Concrete options:

  1. Redact/minimize PluginHookAgentContext by default (avoid workspaceDir and sessionKey unless a plugin explicitly requests them).
  2. Do not pass raw image data to hooks; instead pass stable IDs/metadata only (counts, mimeType), or provide an opt-in flag.
  3. Gate context_assembled behind a capability (e.g., capabilities: ['read_full_context']).

Example: only provide non-sensitive ctx by default:

const hookCtx: PluginHookAgentContext = {
  agentId: hookAgentId,
  sessionId: params.sessionId,
  trigger: params.trigger,// omit workspaceDir/sessionKey/channelId unless plugin has capability
};

const event: PluginHookContextAssembledEvent = {
  runId: params.runId,
  systemPrompt: systemPromptText,
  prompt: effectivePrompt,
  messages: structuredClone(activeSession.messages),
  messageCount: activeSession.messages.length,
  imageCount: imageResult.images.length,// omit images[].data by default
  attemptIndex: params.attemptIndex,
};

If the project’s security model assumes plugins are fully trusted, document that explicitly and consider warning users during installation that plugins can access full prompts and local metadata.


Analyzed PR: #33915 at commit 7872656

Last updated on: 2026-04-23T05:31:52Z

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants