fix #76888: [Bug]: Queued/orphaned user-message merge can produce stale reply#90759
fix #76888: [Bug]: Queued/orphaned user-message merge can produce stale reply#90759zhangguiping-xydt wants to merge 6 commits into
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 9, 2026, 2:06 AM ET / 06:06 UTC. Summary PR surface: Source +137, Tests +159. Total +296 across 7 files. Reproducibility: yes. for the source-level failure shape: current main repairs only a direct user leaf, and state-only entries can advance the leaf while hiding the orphaned user message. I did not rerun the stochastic live channel failure, but the PR body supplies after-fix embedded-runner/provider proof. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the focused embedded-runner repair after maintainers accept the queued-user marker wording and session-state replay semantics, then close the linked issue once the PR merges. Do we have a high-confidence way to reproduce the issue? Yes for the source-level failure shape: current main repairs only a direct user leaf, and state-only entries can advance the leaf while hiding the orphaned user message. I did not rerun the stochastic live channel failure, but the PR body supplies after-fix embedded-runner/provider proof. Is this the best way to solve the issue? Yes, with maintainer acceptance of the marker wording: the embedded-runner/session boundary is the right layer because it decides which prompt reaches the provider. Downstream reply filtering would be a weaker fix because it would not repair the model-input/session-state mismatch. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 4c98a547d09a. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +137, Tests +159. Total +296 across 7 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@tyler6204 @joshavant Could you take a maintainer look? This is a focused embedded-agent message merge/session-state fix; proof is sufficient and ClawSweeper marked it ready. |
Summary
What problem does this PR solve?
Fixes #76888, where a channel-backed embedded-agent session can leave an older queued user message on the active branch and then process a newer user prompt. If the older turn is still treated as an active user instruction, the model can answer the stale turn and the user sees a stale visible reply.
Root cause: orphan repair only checked the current session leaf. Real runs can append non-message metadata entries such as thinking/model snapshots after the orphaned user message before prompt submission, so the actual orphaned user message was hidden behind metadata and the repair did not run.
Why does this matter now?
The issue is P1 session-state/message-loss behavior: the prompt shown to the model and the reply delivered to the user can disagree about which user turn is active. This is especially confusing in channel-backed sessions because the user only sees the final visible reply, not the queued/orphan merge state.
What is the intended outcome?
Fix classification: root-cause fix. The embedded runner now finds the trailing message entry for orphan repair by looking past non-message session metadata that does not participate in LLM context, records those skipped state-changing entries, merges the older user text as context-only prompt material, branches away from that orphan, replays the skipped state entries, and keeps the already-sanitized active session messages for the provider call.
What is intentionally out of scope?
Out of scope: queue semantics, Discord transport behavior, provider routing, model selection, public config/schema/protocol changes, API keys/auth, storage format migrations, and any broad rewrite of session persistence. This PR also does not change how live Discord messages are sent; it fixes the embedded-agent/session/provider boundary that decides which prompt the model should answer.
Related PR scan / canonical scope: this is an update to the existing PR for #76888, and the current daily-fix PR diagnosis resolved the linked issue to #76888 without unresolved issue references. The canonical owner boundary stays in the embedded runner/session repair path rather than downstream reply text filtering.
What does success look like?
A provider-backed embedded run with an orphaned old user leaf plus a newer active channel prompt records
prompt:beforewith the queued-context marker, the old turn, and then the latest turn; the visible reply contains the latest target and excludes the old target.What should reviewers focus on?
Review whether the metadata-skipping rule only skips non-message entries that do not contribute LLM content, whether branching away from the orphan preserves and replays the intended trailing state changes, and whether the prompt marker wording is acceptable as the runtime contract.
Linked context
Closes #76888
Related: ClawSweeper review on this PR requested stronger proof that the change reaches the embedded-agent/model-input boundary, not only the prompt helper.
Real behavior proof (required for external PRs)
Behavior or issue addressed: In a channel-backed embedded-agent run, an older orphaned queued user leaf can sit behind later session metadata and then be followed by a newer active user prompt. The fixed behavior must merge the older text as prior-turn context, remove it from the active session leaf path, and make the provider-visible answer target the later active prompt.
Real environment tested: Production
runEmbeddedAgentwithagent_harness=openclaw,messageChannel:discord,messageProvider:discord,trigger:user, a real JSONL session seeded with a completed setup turn plus an orphaned queued user leaf, tools disabled, cache trace enabled, and a configured OpenAI-compatible provider. Provider/base URL details are redacted; no API keys or private channel identifiers are included here.Exact steps or command run after this patch:
daily_fix run-validation --name live-orphaned-user-proof-local-provider -- node --import tsx <redacted proof runner>Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):
Observed result after fix: The production embedded runner emitted the orphan repair log, the provider-before prompt trace contained the queued-context marker followed by the older orphaned turn and then the newer active turn, and the provider-backed visible reply was exactly
LATEST_TURN_TARGET_76888withoutOLD_TURN_TARGET_76888.What was not tested: I did not send an actual Discord message through the live Discord transport, and I did not reproduce the original stochastic production stale reply end-to-end. The proof uses the production embedded runner with Discord channel metadata and a real configured provider, but the session fixture and proof runner are local and deterministic.
Proof limitations or environment constraints: Provider identity and local configuration are redacted. This proof covers the embedded-agent/session/provider boundary that decides which user turn is answered; external Discord delivery remains outside this local validation.
Before evidence (optional but encouraged): The linked issue includes sanitized production evidence where
prompt.submittedwas the latest user instruction but the visible assistant output answered an earlier user message after an orphaned user-message merge.Tests and validation
Which commands did you run?
Evidence:
What regression coverage was added or updated?
getLeafEntry()no longer hides the orphan from repair.What failed before this fix, if known?
contenton the broaderAgentMessageunion and the new test mock used a narrower(id: string)implementation signature. The follow-up narrows both sides to user messages before readingcontentand makes the mock accept the harness'unknownargument shape.If no test was added, why not?
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes. When an active channel prompt follows an orphaned queued user turn, the older turn is now explicitly preserved as context only and the latest active prompt remains the answer target, even if session metadata was appended after the orphan.
Did config, environment, or migration behavior change? (
Yes/No)No.
Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No.
What is the highest-risk area?
Embedded-runner session-state/prompt-shape compatibility for orphaned queued user repair.
How is that risk mitigated?
The change does not drop user text, alter channel/provider routing, or change storage format. It only looks past non-message metadata entries that do not participate in LLM context, branches away from the orphaned user entry, replays the skipped state-changing entries, and keeps the already-sanitized active session messages instead of rebuilding raw transcript messages.
Patch-quality notes: the
return undefinedpath is a defensive stop for malformed/cyclic session ancestry while walking parent entries, so the runner refuses to repair rather than guessing. Thereturn falsepath in content comparison is also conservative: if message content cannot be serialized for an exact match, the runner does not remove any active-session message. The SDK runtime-contract helper touched in this PR mirrors the shared marker text; it is not a public config/schema/protocol change, and the model-consumed source-of-truth path is the production embedded runner repair.Current review state
What is the next action?
Ready for maintainer review.
What is still waiting on author, maintainer, CI, or external proof?
status: waiting on authorsignal should be cleared by this update; CI still needs to rerun on the updated head after submit.Which bot or reviewer comments were addressed?
thinking_level_change,model_change,custom,session_info, andlabelentries after the orphan leaf is removed from the active branch.node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.spawn-workspace.context-engine.test.ts --project agents-embedded-agent -t "orphan repair" --hookTimeout=300000; result: 1 file passed, 4 tests passed, 55 skipped.node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.test.ts --project agents-embedded-agent -t "mergeOrphanedTrailingUserPrompt" --hookTimeout=300000; result: 1 file passed, 8 tests passed, 113 skipped.