Skip to content

fix #76888: [Bug]: Queued/orphaned user-message merge can produce stale reply#90759

Open
zhangguiping-xydt wants to merge 6 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-76888
Open

fix #76888: [Bug]: Queued/orphaned user-message merge can produce stale reply#90759
zhangguiping-xydt wants to merge 6 commits into
openclaw:mainfrom
zhangguiping-xydt:feat/issue-76888

Conversation

@zhangguiping-xydt

@zhangguiping-xydt zhangguiping-xydt commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

What problem does this PR solve?

Fixes #76888, where a channel-backed embedded-agent session can leave an older queued user message on the active branch and then process a newer user prompt. If the older turn is still treated as an active user instruction, the model can answer the stale turn and the user sees a stale visible reply.

Root cause: orphan repair only checked the current session leaf. Real runs can append non-message metadata entries such as thinking/model snapshots after the orphaned user message before prompt submission, so the actual orphaned user message was hidden behind metadata and the repair did not run.

Why does this matter now?

The issue is P1 session-state/message-loss behavior: the prompt shown to the model and the reply delivered to the user can disagree about which user turn is active. This is especially confusing in channel-backed sessions because the user only sees the final visible reply, not the queued/orphan merge state.

What is the intended outcome?

Fix classification: root-cause fix. The embedded runner now finds the trailing message entry for orphan repair by looking past non-message session metadata that does not participate in LLM context, records those skipped state-changing entries, merges the older user text as context-only prompt material, branches away from that orphan, replays the skipped state entries, and keeps the already-sanitized active session messages for the provider call.

What is intentionally out of scope?

Out of scope: queue semantics, Discord transport behavior, provider routing, model selection, public config/schema/protocol changes, API keys/auth, storage format migrations, and any broad rewrite of session persistence. This PR also does not change how live Discord messages are sent; it fixes the embedded-agent/session/provider boundary that decides which prompt the model should answer.

Related PR scan / canonical scope: this is an update to the existing PR for #76888, and the current daily-fix PR diagnosis resolved the linked issue to #76888 without unresolved issue references. The canonical owner boundary stays in the embedded runner/session repair path rather than downstream reply text filtering.

What does success look like?

A provider-backed embedded run with an orphaned old user leaf plus a newer active channel prompt records prompt:before with the queued-context marker, the old turn, and then the latest turn; the visible reply contains the latest target and excludes the old target.

What should reviewers focus on?

Review whether the metadata-skipping rule only skips non-message entries that do not contribute LLM content, whether branching away from the orphan preserves and replays the intended trailing state changes, and whether the prompt marker wording is acceptable as the runtime contract.

Linked context

Closes #76888

Related: ClawSweeper review on this PR requested stronger proof that the change reaches the embedded-agent/model-input boundary, not only the prompt helper.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: In a channel-backed embedded-agent run, an older orphaned queued user leaf can sit behind later session metadata and then be followed by a newer active user prompt. The fixed behavior must merge the older text as prior-turn context, remove it from the active session leaf path, and make the provider-visible answer target the later active prompt.

  • Real environment tested: Production runEmbeddedAgent with agent_harness=openclaw, messageChannel:discord, messageProvider:discord, trigger:user, a real JSONL session seeded with a completed setup turn plus an orphaned queued user leaf, tools disabled, cache trace enabled, and a configured OpenAI-compatible provider. Provider/base URL details are redacted; no API keys or private channel identifiers are included here.

  • Exact steps or command run after this patch: daily_fix run-validation --name live-orphaned-user-proof-local-provider -- node --import tsx <redacted proof runner>

  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):

    Validation `live-orphaned-user-proof-local-provider` (pass, exit_code=0, 232638ms):
    
    [live-proof] issue 76888 orphaned queued user live proof: still running (213s)
    [agent/embedded] [trace:embedded-run] startup stages: runId=issue-76888-live-proof sessionId=issue-76888-live-proof phase=attempt-dispatch ...
    [agent/embedded] [trace:embedded-run] prep stages: runId=issue-76888-live-proof sessionId=issue-76888-live-proof phase=stream-ready ...
    [agent/embedded] Merged and removed orphaned user message to prevent consecutive user turns. runId=issue-76888-live-proof sessionId=issue-76888-live-proof trigger=user
    [live-proof] issue 76888 orphaned queued user live proof: completed (221s)
    runtime=production runEmbeddedAgent
    agent_harness=openclaw
    channel_metadata=messageChannel:discord messageProvider:discord trigger:user
    provider_model=[redacted configured OpenAI-compatible provider]
    seeded_orphan_leaf=[redacted] role=user contains=OLD_TURN_TARGET_76888
    active_prompt_contains=LATEST_TURN_TARGET_76888
    model_prompt_trace=prompt:before
    prompt_trace_count=1
    prompt_trace_match_index=0
    model_prompt_order=queued-context-marker -> OLD_TURN_76888 -> LATEST_TURN_76888
    visible_reply=LATEST_TURN_TARGET_76888
    assert_latest_target_present=true
    assert_old_target_absent=true
    post_run_leaf_role=assistant
    payload_count=1
    
  • Observed result after fix: The production embedded runner emitted the orphan repair log, the provider-before prompt trace contained the queued-context marker followed by the older orphaned turn and then the newer active turn, and the provider-backed visible reply was exactly LATEST_TURN_TARGET_76888 without OLD_TURN_TARGET_76888.

  • What was not tested: I did not send an actual Discord message through the live Discord transport, and I did not reproduce the original stochastic production stale reply end-to-end. The proof uses the production embedded runner with Discord channel metadata and a real configured provider, but the session fixture and proof runner are local and deterministic.

  • Proof limitations or environment constraints: Provider identity and local configuration are redacted. This proof covers the embedded-agent/session/provider boundary that decides which user turn is answered; external Discord delivery remains outside this local validation.

  • Before evidence (optional but encouraged): The linked issue includes sanitized production evidence where prompt.submitted was the latest user instruction but the visible assistant output answered an earlier user message after an orphaned user-message merge.

Tests and validation

Which commands did you run?

daily_fix run-validation --name live-orphaned-user-proof-local-provider -- node --import tsx <redacted proof runner>
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts --project agents-embedded-agent -t "orphan repair" --hookTimeout=300000
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts --project agents-embedded-agent -t "mergeOrphanedTrailingUserPrompt" --hookTimeout=300000
pnpm build:ci-artifacts
pnpm tsgo:prod
pnpm check:test-types
pnpm lint
bash -lc 'pnpm deadcode:dependencies && pnpm deadcode:unused-files && pnpm deadcode:report:ci:ts-unused'

Evidence:

Validation `live-orphaned-user-proof-local-provider` (pass, exit_code=0, 232638ms):

[agent/embedded] Merged and removed orphaned user message to prevent consecutive user turns. runId=issue-76888-live-proof sessionId=issue-76888-live-proof trigger=user
runtime=production runEmbeddedAgent
agent_harness=openclaw
channel_metadata=messageChannel:discord messageProvider:discord trigger:user
model_prompt_trace=prompt:before
prompt_trace_count=1
prompt_trace_match_index=0
model_prompt_order=queued-context-marker -> OLD_TURN_76888 -> LATEST_TURN_76888
visible_reply=LATEST_TURN_TARGET_76888
assert_latest_target_present=true
assert_old_target_absent=true
post_run_leaf_role=assistant
payload_count=1

Validation `reviewer-orphan-repair-test-after-state-preserve-2` (pass, exit_code=0):

 RUN  v4.1.7 [local path redacted]

 Test Files  1 passed (1)
      Tests  4 passed | 55 skipped (59)
   Duration  16.13s

Validation `reviewer-merge-helper-test-after-state-preserve-2` (pass, exit_code=0):

 RUN  v4.1.7 [local path redacted]

 Test Files  1 passed (1)
      Tests  8 passed | 113 skipped (121)
   Duration  9.70s

Validation `ci-prod-types-after-state-preserve-2` (pass, exit_code=0): `pnpm tsgo:prod` completed.
Validation `ci-test-types-after-state-preserve-2` (pass, exit_code=0): `pnpm check:test-types` completed.
Validation `ci-lint-after-state-preserve-2` (pass, exit_code=0): `pnpm lint` completed.
Validation `ci-dependencies-after-state-preserve` (pass, exit_code=0): dependencies, unused files, and CI unused-export report completed.
Validation `ci-build-artifacts-after-state-preserve` (pass, exit_code=0): production build artifacts completed, including plugin SDK declarations and control UI build.

What regression coverage was added or updated?

  • Added embedded-runner/context-engine coverage for the real failure shape where a non-message session metadata leaf sits after the orphaned user message, so direct getLeafEntry() no longer hides the orphan from repair.
  • Extended that coverage to prove state-changing entries after the orphaned user leaf are replayed in order: thinking level, model change, and a model-snapshot custom entry survive the branch/reset repair.
  • Kept the provider-seam coverage proving that the repaired prompt reaches the session/provider boundary and that the answer target remains the latest active prompt.
  • Existing helper, merge strategy, and plugin SDK runtime-contract coverage continue to pin the shared queued-user marker and merge behavior.

What failed before this fix, if known?

  • Before the final repair, the production live proof loaded the orphaned user message in session history but the provider-before prompt trace still contained only the latest active prompt; the orphan was hidden behind session metadata and was not merged into the repaired prompt.
  • Before the state-preservation follow-up, the parent-walk repair could branch before trailing state metadata and lose persisted thinking/model/custom state. The focused regression failed until the repair collected those skipped entries and replayed them after branch/reset.
  • After the first submit of the metadata repair, remote CI failed on the current PR diff because the new active-session cleanup compared content on the broader AgentMessage union and the new test mock used a narrower (id: string) implementation signature. The follow-up narrows both sides to user messages before reading content and makes the mock accept the harness' unknown argument shape.

If no test was added, why not?

  • Tests were added.

Risk checklist

Did user-visible behavior change? (Yes/No)

Yes. When an active channel prompt follows an orphaned queued user turn, the older turn is now explicitly preserved as context only and the latest active prompt remains the answer target, even if session metadata was appended after the orphan.

Did config, environment, or migration behavior change? (Yes/No)

No.

Did security, auth, secrets, network, or tool execution behavior change? (Yes/No)

No.

What is the highest-risk area?

Embedded-runner session-state/prompt-shape compatibility for orphaned queued user repair.

How is that risk mitigated?

The change does not drop user text, alter channel/provider routing, or change storage format. It only looks past non-message metadata entries that do not participate in LLM context, branches away from the orphaned user entry, replays the skipped state-changing entries, and keeps the already-sanitized active session messages instead of rebuilding raw transcript messages.

Patch-quality notes: the return undefined path is a defensive stop for malformed/cyclic session ancestry while walking parent entries, so the runner refuses to repair rather than guessing. The return false path in content comparison is also conservative: if message content cannot be serialized for an exact match, the runner does not remove any active-session message. The SDK runtime-contract helper touched in this PR mirrors the shared marker text; it is not a public config/schema/protocol change, and the model-consumed source-of-truth path is the production embedded runner repair.

Current review state

What is the next action?

Ready for maintainer review.

What is still waiting on author, maintainer, CI, or external proof?

  • RF-001: The prior status: waiting on author signal should be cleared by this update; CI still needs to rerun on the updated head after submit.
  • RF-004: Maintainers still need to accept the exact prompt marker and SDK test-helper marker wording.
  • A live Discord transport replay was not run; local validation covers the production embedded-runner/session/provider boundary with Discord channel metadata.

Which bot or reviewer comments were addressed?

  • RF-002: Addressed with focused context-engine regression coverage for an orphaned user leaf followed by thinking/model/custom state entries.
  • RF-003: Addressed by preserving state-changing entries that trail an orphaned user message instead of dropping them when the repair branches away from the orphan.
  • RF-005: Addressed with the narrow automated repair suggested by review: collect skipped state entries during orphan repair and replay them after branch/reset, with focused regression coverage.
  • RF-006: Addressed in the embedded runner by replaying skipped thinking_level_change, model_change, custom, session_info, and label entries after the orphan leaf is removed from the active branch.
  • RF-007: Addressed by running node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts --project agents-embedded-agent -t "orphan repair" --hookTimeout=300000; result: 1 file passed, 4 tests passed, 55 skipped.
  • RF-008: Addressed by running node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts --project agents-embedded-agent -t "mergeOrphanedTrailingUserPrompt" --hookTimeout=300000; result: 1 file passed, 8 tests passed, 113 skipped.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XS proof: supplied External PR includes structured after-fix real behavior proof. labels Jun 5, 2026
@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 9, 2026, 2:06 AM ET / 06:06 UTC.

Summary
The PR updates embedded-agent orphaned-user repair to walk past non-message session metadata, replay skipped state entries, change the queued-user prompt marker, and add regression coverage.

PR surface: Source +137, Tests +159. Total +296 across 7 files.

Reproducibility: yes. for the source-level failure shape: current main repairs only a direct user leaf, and state-only entries can advance the leaf while hiding the orphaned user message. I did not rerun the stochastic live channel failure, but the PR body supplies after-fix embedded-runner/provider proof.

Review metrics: 2 noteworthy metrics.

  • SDK marker contract: 1 exported helper constant changed. The plugin SDK test-helper marker mirrors provider prompt text, so exact wording needs maintainer acceptance before merge.
  • State replay surface: 5 non-message entry types replayed. The repair branches around an orphan user entry, so the replay list determines which session metadata survives on the active path.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Get explicit maintainer acceptance of the queued-user marker wording and session-state replay semantics before merge.

Risk before merge

  • [P1] The exact queued-user marker changes provider-visible prompt text and a plugin SDK test-helper contract constant, so maintainers should explicitly accept or revise that wording before merge.
  • [P1] The repair rewrites active session ancestry and replays state entries for existing channel-backed sessions with orphaned queued user turns; the code shape looks targeted, but session-state compatibility remains a maintainer-owned merge decision.

Maintainer options:

  1. Accept the prompt and session contract (recommended)
    Maintainers can approve the new context-only marker wording and replay behavior, then land once required checks are green on the latest head.
  2. Keep the old marker wording
    If prompt text stability matters more than the clearer marker, revise the PR to keep the old marker while preserving the parent-walk and state replay repair.
  3. Pause for transport proof
    If embedded-runner/provider proof is not enough for this workflow, pause landing until a real channel transcript demonstrates the latest prompt is answered.

Next step before merge

  • [P2] The remaining action is maintainer acceptance of compatibility-sensitive prompt and session-state behavior, not a narrow automated repair.

Security
Cleared: The diff touches TypeScript runtime, tests, and a test-helper marker only; I found no dependency, CI, secret, permission, or supply-chain change.

Review details

Best possible solution:

Land the focused embedded-runner repair after maintainers accept the queued-user marker wording and session-state replay semantics, then close the linked issue once the PR merges.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level failure shape: current main repairs only a direct user leaf, and state-only entries can advance the leaf while hiding the orphaned user message. I did not rerun the stochastic live channel failure, but the PR body supplies after-fix embedded-runner/provider proof.

Is this the best way to solve the issue?

Yes, with maintainer acceptance of the marker wording: the embedded-runner/session boundary is the right layer because it decides which prompt reaches the provider. Downstream reply filtering would be a weaker fix because it would not repair the model-input/session-state mismatch.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 4c98a547d09a.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied redacted live output from production runEmbeddedAgent with a configured provider, showing repaired prompt order and a latest-target visible reply after the fix.

Label justifications:

  • P1: The PR targets a user-visible agent/channel workflow where the assistant can answer an older instruction instead of the latest user prompt.
  • merge-risk: 🚨 compatibility: The diff changes exact prompt marker text and an exported SDK test-helper constant that downstream tests or prompt expectations may depend on.
  • merge-risk: 🚨 session-state: The diff changes how embedded runs branch and replay session metadata around orphaned user turns.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes copied redacted live output from production runEmbeddedAgent with a configured provider, showing repaired prompt order and a latest-target visible reply after the fix.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes copied redacted live output from production runEmbeddedAgent with a configured provider, showing repaired prompt order and a latest-target visible reply after the fix.
Evidence reviewed

PR surface:

Source +137, Tests +159. Total +296 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 4 145 8 +137
Tests 3 171 12 +159
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 316 20 +296

What I checked:

Likely related people:

  • vincentkoc: Prior issue review traces the active-turn queued-user preservation behavior to commit fcae3bf, and recent path history shows adjacent agent-runtime work by the same author. (role: introduced behavior and recent area contributor; confidence: high; commits: fcae3bf9433b, 8b03fd1f5f83; files: src/agents/embedded-agent-runner/run/attempt.ts, src/agents/embedded-agent-runner/run/attempt.prompt-helpers.ts)
  • steipete: GitHub path history shows recent session-service and plugin-SDK documentation/contract work touching adjacent session and SDK helper surfaces that this PR crosses. (role: recent adjacent area contributor; confidence: medium; commits: 20577f0b3b21, 048f307695f7; files: src/agents/sessions/session-manager.ts, src/agents/embedded-agent-runner/run/attempt.prompt-helpers.ts, src/plugin-sdk/test-helpers/agents/transcript-repair-runtime-contract.ts)
  • jalehman: Recent current-main history lists this person as reviewer/co-author on adjacent user-turn serialization and prompt-cache stabilization work in the embedded-runner boundary. (role: recent reviewer on adjacent session/prompt behavior; confidence: medium; commits: 1af55bc6654f; files: src/agents/embedded-agent-runner/run/attempt.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 5, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 6, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 6, 2026
@openclaw-barnacle openclaw-barnacle Bot added size: M and removed size: XS proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 8, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 8, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 8, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 8, 2026
@clawsweeper clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label Jun 8, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 9, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label Jun 9, 2026
@zhangguiping-xydt

Copy link
Copy Markdown
Contributor Author

@tyler6204 @joshavant Could you take a maintainer look? This is a focused embedded-agent message merge/session-state fix; proof is sufficient and ClawSweeper marked it ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Queued/orphaned user-message merge can produce stale reply

1 participant