Skip to content

fix(agents): validate context engine assemble result shape#87549

Merged
steipete merged 1 commit into
openclaw:mainfrom
Pluviobyte:fix/context-engine-assemble-guard
May 31, 2026
Merged

fix(agents): validate context engine assemble result shape#87549
steipete merged 1 commit into
openclaw:mainfrom
Pluviobyte:fix/context-engine-assemble-guard

Conversation

@Pluviobyte

Copy link
Copy Markdown
Contributor

Summary

  • Guard the harness assembleHarnessContextEngine boundary against context engine plugins that return a malformed assemble result, so a buggy plugin can no longer poison activeSession.messages with undefined and crash the prompt assembly stage with Cannot read properties of undefined (reading 'length').
  • Throw a descriptive error naming the offending engine id; the runner's existing assemble try/catch in attempt.ts already logs the failure and falls back to the unmodified pipeline messages, so behavior on a well-formed plugin is unchanged.
  • Out of scope: the third-party plugin's own bug, native runtime compaction, the broader context-engine RFC (RFC: Expose existing InputProvenance / AgentInternalEvent signals to ContextEngine.assemble() #82137), and any change to provider transports.

Linked context

Closes #75541

Was this requested by a maintainer or owner?

No direct maintainer request; ClawSweeper triage on the issue flagged this as a safe repair candidate at the harness boundary.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: With a third-party context engine (lossless-claw v0.9.2 in the report) that returns { estimatedTokens: 0 } without a messages array, OpenClaw previously assigned undefined to activeSession.agent.state.messages and then crashed inside attempt.ts when it read activeSession.messages.length.

  • Real environment tested: Local OpenClaw source checkout on Linux with Node v22.22.3, pnpm 11.2.2 via corepack.

  • Exact steps or command run after this patch: Ran two local OpenClaw source smokes with PATH=\"\$HOME/.nvm/versions/node/v22.22.3/bin:\$PATH\" node --import tsx --input-type=module -e '...' against the real assembleHarnessContextEngine, once before the patch (via git stash of the change) and once after, using a lossless-claw-shaped plugin that omits messages.

  • Evidence after fix: Copied console output from those local OpenClaw source smokes.

    Before-fix (with the patch temporarily stashed on the same branch):

    BEFORE-FIX BEHAVIOR: assemble returned truthy: true
    assembled.messages: undefined
    Array.isArray(assembled.messages): false
    DOWNSTREAM CRASH: Cannot read properties of undefined (reading 'length')
    

    After-fix (working tree restored to this patch):

    AFTER-FIX BEHAVIOR: caught descriptive error before downstream crash
    error: context engine \"lossless-claw\" assemble() returned an invalid result: expected an object with a \"messages\" array (got messages of type undefined)
    VALID RESULT PASSTHROUGH: messages.length = 1 estimatedTokens = 1
    
  • Observed result after fix: The malformed plugin payload throws a descriptive error at the harness boundary before any state assignment, naming the engine id. A well-formed AssembleResult still passes through unchanged. The runner's existing try { ... } catch (assembleErr) { log.warn('context engine assemble failed, using pipeline messages: ...') } in src/agents/embedded-agent-runner/run/attempt.ts then logs the failure and falls back to the unmodified pipeline messages.

  • What was not tested: A live OpenClaw session that actually installs and runs lossless-claw v0.9.2 against the embedded runner. The fix targets the runtime boundary, not the third-party plugin itself.

  • Proof limitations or environment constraints: The after-fix proof uses the real OpenClaw harness wrapper on the real AssembleResult consumer path, but it does not install the lossless-claw plugin because this environment is not configured with that plugin's runtime. The malformed shape used in the smoke matches the exact shape ClawSweeper identified as crash-prone in the issue triage.

  • Before evidence (optional but encouraged): Captured directly above via the same script after git stash-ing this patch.

Tests and validation

Which commands did you run?

  • PATH=\"\$HOME/.nvm/versions/node/v22.22.3/bin:\$PATH\" node scripts/test-projects.mjs src/agents/harness/context-engine-lifecycle.test.ts
  • PATH=\"\$HOME/.nvm/versions/node/v22.22.3/bin:\$PATH\" node scripts/test-projects.mjs src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts
  • PATH=\"\$HOME/.nvm/versions/node/v22.22.3/bin:\$PATH\" corepack pnpm exec oxfmt --check src/agents/harness/context-engine-lifecycle.ts src/agents/harness/context-engine-lifecycle.test.ts
  • git diff --check

Outputs:

  • context-engine-lifecycle.test.ts: Test Files 1 passed (1) / Tests 9 passed (9) (3 existing + 6 new shape-validation cases).
  • attempt.spawn-workspace.context-engine.test.ts: Test Files 1 passed (1) / Tests 53 passed (53) — no regression in the existing adjacent assemble integration coverage.
  • oxfmt --check: All matched files use the correct format.

What regression coverage was added or updated?

Added a describe(\"assembleHarnessContextEngine result validation\") block covering: well-formed passthrough, undefined result, null result, { estimatedTokens: 0 } missing messages (the exact lossless-claw shape from #75541), messages: \"string\" (wrong type), and messages: null. Each rejection asserts that the engine id appears in the thrown message so the runner's existing assemble-failure log carries actionable context.

What failed before this fix, if known?

The before-fix smoke above reproduces Cannot read properties of undefined (reading 'length'). Without this guard, the new tests' rejection assertions would not be possible because the wrapper previously returned the malformed object unchanged.

If no test was added, why not?

N/A.

Risk checklist

Did user-visible behavior change? (Yes/No)

Yes (recovery semantics). A plugin returning a malformed assemble result now produces a runtime warning naming the engine and falls back to pipeline messages, instead of crashing the turn. Behavior on a well-formed AssembleResult is unchanged.

Did config, environment, or migration behavior change? (Yes/No)

No.

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

No.

What is the highest-risk area?

A previously crashing plugin will now silently fall back to pipeline messages, potentially masking a real plugin bug behind a single warn log.

How is that risk mitigated?

The thrown error string is descriptive and names the offending engine id, so the existing log.warn(\"context engine assemble failed, using pipeline messages: ...\") in attempt.ts surfaces both the engine id and the exact shape mismatch. Validation is strict enough to reject the failure mode in #75541 but only enforces what the public AssembleResult type already requires.

Current review state

What is the next action?

Await maintainer / ClawSweeper review on whether the validator scope (only messages shape) is the preferred boundary, or whether estimatedTokens should also be validated.

What is still waiting on author, maintainer, CI, or external proof?

A live install of the third-party lossless-claw plugin is still not provided from this environment.

Which bot or reviewer comments were addressed?

Initial PR; no PR comments yet.

AI-assisted: yes. I reviewed the touched code path and ran the focused validation commands above.

Made with Cursor

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S proof: supplied External PR includes structured after-fix real behavior proof. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 2:18 AM ET / 06:18 UTC.

Summary
The PR adds a runtime guard and regression tests around assembleHarnessContextEngine so malformed context-engine assemble results without a messages array throw before runner state is updated.

PR surface: Source +40, Tests +61. Total +101 across 2 files.

Reproducibility: yes. By source inspection, current main returns a raw truthy assemble payload and the runner assigns assembled.messages; the contributor also supplied before/after terminal proof showing the malformed shape that previously led to a .length crash.

Review metrics: 1 noteworthy metric.

  • Context-engine contract checks: 1 runtime validation boundary added. The PR tightens plugin assemble() result handling, so maintainers should notice and accept the compatibility behavior before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Have a maintainer decide whether messages-only validation is the intended runtime boundary or whether the guard should also validate estimatedTokens.

Risk before merge

  • [P2] Stricter validation changes behavior for any context engine that returned a truthy object without a messages array: it now throws into the existing fallback path instead of continuing with that payload.
  • [P1] The PR intentionally validates only the messages array even though the public AssembleResult type also requires estimatedTokens, so maintainers should accept that scope or request full runtime validation before merge.

Maintainer options:

  1. Accept the scoped messages guard (recommended)
    Keep this PR's messages-array validation as the runtime boundary so malformed engines fall back to pipeline messages instead of crashing.
  2. Expand to full contract validation
    If maintainers want strict AssembleResult enforcement, require a finite numeric estimatedTokens and add matching tests before merge.
  3. Pause for the broader context-engine RFC
    If maintainers want this handled only as part of the broader context-engine contract work, pause this PR and leave the linked crash issue open.

Next step before merge

  • [P2] A maintainer should decide the plugin contract scope; there is no narrow automated repair unless they request full AssembleResult validation.

Security
Cleared: The diff only changes TypeScript runtime validation and tests; it does not touch dependencies, CI, secrets, auth, networking, package resolution, or generated artifacts.

Review details

Best possible solution:

Land the focused harness boundary guard after maintainer acceptance of the scoped messages-array validation, or expand the validator to the full AssembleResult contract before merge.

Do we have a high-confidence way to reproduce the issue?

Yes. By source inspection, current main returns a raw truthy assemble payload and the runner assigns assembled.messages; the contributor also supplied before/after terminal proof showing the malformed shape that previously led to a .length crash.

Is this the best way to solve the issue?

Yes, if maintainers accept the compatibility scope. The shared harness boundary covers embedded and Codex callers and uses existing try/catch fallback behavior, while broader estimatedTokens validation is a maintainer contract choice.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8d63d466b858.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from the real harness wrapper showing the malformed payload is rejected before downstream crash, with a clear note that the third-party plugin itself was not installed.
  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority agent-runtime crash fix with limited surface area around context-engine assembly.
  • merge-risk: 🚨 compatibility: Merging changes how malformed context-engine plugin results behave at runtime by throwing into fallback instead of flowing through.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes after-fix terminal output from the real harness wrapper showing the malformed payload is rejected before downstream crash, with a clear note that the third-party plugin itself was not installed.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix terminal output from the real harness wrapper showing the malformed payload is rejected before downstream crash, with a clear note that the third-party plugin itself was not installed.
Evidence reviewed

PR surface:

Source +40, Tests +61. Total +101 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 42 2 +40
Tests 1 61 0 +61
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 103 2 +101

What I checked:

Likely related people:

  • jalehman: Authored and merged the context-engine plugin system pull request that introduced AssembleResult and the pluggable context-engine contract. (role: feature owner; confidence: high; commits: fee91fefceb4; files: src/context-engine/types.ts)
  • Peter Steinberger: Current blame and history for the shared harness wrapper and embedded runner assemble path point to recent work by Peter Steinberger in this checkout. (role: recent area contributor; confidence: medium; commits: 27ae826f6525, 15772c527aca; files: src/agents/harness/context-engine-lifecycle.ts, src/agents/embedded-agent-runner/run/attempt.ts, extensions/codex/src/app-server/run-attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: ✨ hatched 🥚 common Gilded Review Wisp. Rarity: 🥚 common. Trait: collects tiny proofs.

Details

Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Gilded Review Wisp in ClawSweeper.
Hatchability:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

About:

  • Eggs appear after real-behavior proof passes. They are collectible flavor only.
  • Review momentum changes the shell state: follow-up work warms it, re-review makes it wobble, and a clean final review lets it hatch.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 29, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 29, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 29, 2026
@jalehman jalehman self-assigned this May 31, 2026
Context engine plugins that returned an object without a `messages` array (for example a malformed lossless-claw assemble payload) previously poisoned activeSession.messages with `undefined`. Downstream prompt assembly then crashed with `Cannot read properties of undefined (reading 'length')`, aborting the turn before any model call.

Validate the assemble() return shape at the harness boundary so an invalid result throws a descriptive error naming the offending engine. The runner's existing assemble try/catch already logs the failure and falls back to the unmodified pipeline messages, so a buggy plugin can no longer corrupt session state.

Fixes openclaw#75541

Co-authored-by: Cursor <cursoragent@cursor.com>
@steipete steipete force-pushed the fix/context-engine-assemble-guard branch from 0938413 to 5b6b7b1 Compare May 31, 2026 17:17
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@steipete

Copy link
Copy Markdown
Contributor

Maintainer update before landing:

  • Rebased the PR branch onto current origin/main and removed stale merge commits; new head is 5b6b7b1bf69b8f30329fdf749161a192d3d016fe.
  • Kept the contributor commit author intact (Pluviobyte <Pluviobyte@users.noreply.github.com>).
  • Reviewed the changed harness boundary against the current embedded runner and Codex app-server callers. The fix keeps validation scoped to the required messages array because current runners consume messages, promptAuthority, systemPromptAddition, and projection data but do not consume estimatedTokens on this path.

Local proof on the rebased head:

  • node scripts/run-vitest.mjs src/agents/harness/context-engine-lifecycle.test.ts -> passed, 1 file / 9 tests.
  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts -> passed, 1 file / 56 tests.
  • pnpm exec oxfmt --check src/agents/harness/context-engine-lifecycle.ts src/agents/harness/context-engine-lifecycle.test.ts -> passed.
  • git diff --check origin/main...HEAD -> passed.

Fresh GitHub checks started for the rebased head:

Known proof gap: I did not install the third-party lossless-claw plugin live. The regression is covered at the shared harness boundary with the malformed shape from #75541, plus adjacent embedded-runner context-engine coverage for the fallback path.

@steipete steipete merged commit 301f17f into openclaw:main May 31, 2026
151 of 152 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] TypeError at prompt assembly stage when lossless-claw is enabled (reading 'length' on undefined)

4 participants