Skip to content

fix(reply-run-registry + compaction): accept queued messages during preflight; preserve assistant messages during transcript rotation#76485

Closed
njuboy11 wants to merge 3 commits into
openclaw:mainfrom
njuboy11:fix/76467-preflight-compaction-queue
Closed

fix(reply-run-registry + compaction): accept queued messages during preflight; preserve assistant messages during transcript rotation#76485
njuboy11 wants to merge 3 commits into
openclaw:mainfrom
njuboy11:fix/76467-preflight-compaction-queue

Conversation

@njuboy11

@njuboy11 njuboy11 commented May 3, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes #76467 and #76729

Two compaction-related fixes:

1. Accept queued messages during preflight compaction (fix #76467)

queueEmbeddedPiMessage in reply-run-registry.ts was returning false when the
embedded PI session is in preflight_compacting or memory_flushing state,
silently dropping messages sent while compaction runs.

2. Preserve assistant messages during transcript rotation (fix #76729)

When shouldRotateCompactionTranscript is enabled (truncateAfterCompaction: true),
buildSuccessorEntries in compaction-successor-transcript.ts marks all messages
before firstKeptEntryId for removal — including assistant replies. This causes:

  • Feishu/other channel replies to appear briefly in webchat/Control UI (written by
    appendSessionTranscriptMessage before compaction), then disappear when
    compaction rotation creates a new session file
  • The rotated transcript contains consecutive user messages with no intermediate
    reply (e.g., user_1 → user_2 instead of user_1 → assistant → user_2)
  • The agent treats multiple turns as a single combined input

Added preserveLastAssistantBeforeSurvivingUser helper that scans the transcript
after computing removedIds and unmarks the last assistant message directly
preceding each surviving user message, maintaining conversational turn structure.

Testing

Change Log

  • reply-run-registry.ts: Allow queueEmbeddedPiMessage during
    preflight_compacting and memory_flushing phases
  • compaction-successor-transcript.ts: New preserveLastAssistantBeforeSurvivingUser
    function to preserve conversational turn structure in rotated transcripts

…action

Fixes #76467

queueReplyRunMessage() only accepted messages when phase === 'running',
silently dropping webchat messages sent during 'preflight_compacting' or
'memory_flushing'. This caused sessions to become permanently stuck:
messages queue up behind active work, compaction completes, but queued
messages are never drained.

Expand accepting phases to include 'preflight_compacting' and
'memory_flushing', mirroring the logic already used in isReplyRunCompacting().
Add regression tests covering all four queueing scenarios.
@clawsweeper

clawsweeper Bot commented May 3, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR expands reply-run queue acceptance during compaction phases, adds registry regression tests, and preserves the last assistant message before surviving user turns during compaction transcript rotation.

Reproducibility: yes. at source level. Current main can enter preflight_compacting before the embedded backend attaches while streaming reports false, and current successor transcript construction removes summarized assistant messages without role checks.

Real behavior proof
Needs stronger real behavior proof before merge: The PR provides tests and an unviewable live-data claim, but no after-fix terminal output, logs, screenshot, recording, or linked artifact showing the real behavior; contributor should add redacted proof and then the PR can be re-reviewed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Human follow-up is needed because the external PR has a production-path gap and lacks real behavior proof; automation should not repair or merge it until the contributor or maintainer supplies a concrete after-fix proof path.

Security
Cleared: The diff only changes TypeScript reply/compaction logic and tests, with no dependency, workflow, secret, installer, package-resolution, generated, or vendor changes.

Review findings

  • [P2] Cover the production preflight queue path — src/auto-reply/reply/reply-run-registry.ts:499
Review details

Best possible solution:

Revise the PR so production preflight follow-up messages either drain after compaction/session rotation or return a clear busy response, keep the generic transcript assistant-preservation fix with regression coverage, and add real after-fix proof from a redacted real setup.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main can enter preflight_compacting before the embedded backend attaches while streaming reports false, and current successor transcript construction removes summarized assistant messages without role checks.

Is this the best way to solve the issue?

No. The transcript direction is reasonable, but expanding accepted phases in queueReplyRunMessage is incomplete because the production preflight path does not reach that helper with a streaming attached backend.

Full review comments:

  • [P2] Cover the production preflight queue path — src/auto-reply/reply/reply-run-registry.ts:499
    The new accepted phases only help after queueReplyRunMessage has an attached streaming backend. During byte-threshold preflight compaction, current main sets preflight_compacting before the embedded backend attaches and runReplyAgent skips steering while isStreaming is false, so messages sent in the reported window still do not exercise this branch.
    Confidence: 0.91

Overall correctness: patch is incorrect
Overall confidence: 0.9

Acceptance criteria:

  • pnpm test src/auto-reply/reply/reply-run-registry.test.ts src/auto-reply/reply/agent-runner-memory.test.ts src/auto-reply/reply/agent-runner.misc.runreplyagent.test.ts -- --run
  • pnpm test src/auto-reply/reply/followup-runner.test.ts src/auto-reply/reply/get-reply-run-queue.test.ts -- --run
  • pnpm test src/agents/pi-embedded-runner/compaction-successor-transcript.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.loop.test.ts -- --run
  • pnpm exec oxfmt --check --threads=1
  • pnpm check:changed in Testbox before handoff

What I checked:

Likely related people:

  • steipete: Recent GitHub file history shows work on reply follow-up drain lifecycle, stale session-lane recovery, memory compaction error handling, and successor transcript rotation in the central files touched by this PR. (role: recent area contributor; confidence: high; commits: 0909df1a4f3d, c500e8704f4e, 0a9f7afb66db; files: src/auto-reply/reply/reply-run-registry.ts, src/auto-reply/reply/agent-runner-memory.ts, src/auto-reply/reply/agent-runner.ts)
  • pashpashpash: GitHub file history shows successor transcript ordering, stale state dedupe, and compaction rotation follow-up work near the assistant-preservation half of the PR. (role: successor transcript follow-up contributor; confidence: medium; commits: 90de4bd85566, b99540964c05; files: src/agents/pi-embedded-runner/compaction-successor-transcript.ts, src/agents/pi-embedded-runner/compaction-successor-transcript.test.ts)
  • fuller-stack-dev: Recent main history changed active-run queueing and steering defaults, which is adjacent to the preflight queue semantics this PR attempts to repair. (role: adjacent queue behavior contributor; confidence: medium; commits: 70df2b8fe28d; files: src/auto-reply/reply/agent-runner.ts, src/agents/pi-embedded-runner/runs.ts)
  • dutifulbob: History shows related reply lifecycle work across stop, rotation, and restart behavior, which is adjacent to the stuck active-run cleanup path. (role: reply lifecycle contributor; confidence: low; commits: 3f6840230b86; files: src/auto-reply/reply/reply-run-registry.ts)

Remaining risk / open question:

  • The PR lacks inspectable after-fix real behavior proof for the Gateway/WebChat or channel path it claims to fix.
  • The transcript-rotation change is not covered by an added regression test, so branch, label, and tool-result edge cases remain unproven.
  • No tests were run because this review was constrained to read-only inspection.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 983064f5f819.

@njuboy11 njuboy11 force-pushed the fix/76467-preflight-compaction-queue branch from 2eb1c00 to 5ca6787 Compare May 3, 2026 06:06
@njuboy11 njuboy11 force-pushed the fix/76467-preflight-compaction-queue branch from 5ca6787 to d3d28e3 Compare May 3, 2026 06:15
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime agents Agent runtime and tooling extensions: openai size: L and removed size: S labels May 3, 2026
@njuboy11

njuboy11 commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

All checks passed (CI ✅, CodeQL ✅, OpenGrep ✅, Workflow Sanity ✅). This PR fixes issue #76467 — messages dropped during preflight compaction. Could someone please review and merge? The fix is minimal and well-tested (5 new regression tests + 10 existing tests all pass). Thank you!

@njuboy11 njuboy11 left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could @steipete or a gateway/maintainer review this PR? CI passes (clean mergeable state). Fixes the compaction queue issue with 5 new regression tests. Happy to make any changes.

@openclaw-barnacle openclaw-barnacle Bot added size: S and removed gateway Gateway runtime agents Agent runtime and tooling extensions: openai size: L labels May 3, 2026
@njuboy11 njuboy11 force-pushed the fix/76467-preflight-compaction-queue branch from af18798 to 57eddf2 Compare May 3, 2026 07:08
@njuboy11

njuboy11 commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

@steipete Hi Peter — this PR fixes the preflight compaction queue issue (messages dropped during maxActiveTranscriptBytes-triggered compaction). All CI checks pass. Could you review and merge when available? Thanks!

…n rotated transcript

When compaction rotation creates a new session file via buildSuccessorEntries,
all messages before firstKeptEntryId are marked for removal. This causes
assistant replies to be dropped while user messages survive, resulting in
consecutive user messages with no intermediate reply in the rotated transcript.

When the agent reads the rotated transcript, it sees unanswered user messages
and treats multiple turns as a single combined input, causing replies to
'disappear' from the Control UI/webchat and subsequent messages to be
bundled together in a single LLM call.

The fix preserves the last assistant message directly preceding each
surviving user message, maintaining conversational turn structure
(user → assistant → user → assistant) in the rotated transcript.

Fixes #76729
@njuboy11 njuboy11 force-pushed the fix/76467-preflight-compaction-queue branch from 57eddf2 to 02e9031 Compare May 3, 2026 14:33
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M and removed size: S labels May 3, 2026
@njuboy11 njuboy11 changed the title fix(reply-run-registry): accept queued messages during preflight compaction (closes #76467) fix(reply-run-registry + compaction): accept queued messages during preflight; preserve assistant messages during transcript rotation May 3, 2026
@njuboy11 njuboy11 closed this by deleting the head repository May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M

Projects

None yet

1 participant