fix(pi-runner): flush block replies after compaction retry#85288
Conversation
|
Codex review: needs maintainer review before merge. Latest ClawSweeper review: 2026-05-22 10:44 UTC / May 22, 2026, 6:44 AM ET. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. at source level. Current main resolves the compaction retry before the async channel flush necessarily settles, and the PR body includes log-backed after-fix output demonstrating that the second flush is the delivery barrier. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the narrow post-wait delivery barrier with its regression test after required checks pass, then let #47335 close through the merged fix. Do we have a high-confidence way to reproduce the issue? Yes, at source level. Current main resolves the compaction retry before the async channel flush necessarily settles, and the PR body includes log-backed after-fix output demonstrating that the second flush is the delivery barrier. Is this the best way to solve the issue? Yes. A second idempotent flush after a successful retry wait is the narrow maintainable fix, and the PR preserves aggregate-timeout behavior by skipping the extra drain on timeout. Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against ff79299d68e3. |
21a1f39 to
63a5e4d
Compare
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Mossy Patch Peep Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Local follow-up on the remaining CI blocker:
The PR head is still showing the single failed GitHub Actions job |
63a5e4d to
70f45ff
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
* fix(doctor): prune stale bundled plugin paths (openclaw#85038) * fix: honor OPENCLAW_HOME defaults (openclaw#85802) * fix: honor OPENCLAW_HOME defaults * fix(install): preserve openclaw home upgrade defaults * fix(install): satisfy shellcheck tilde patterns * fix(config): do not suppress recovery retry after failed backup restore (openclaw#85787) maybeRecoverSuspiciousConfigRead unconditionally recorded lastObservedSuspiciousSignature in health state even when restoredFromBackup was false (copyFile failed). The guard at resolveConfigReadRecoveryContext then prevented the same signature from ever being retried, permanently accepting the suspicious config on every subsequent launch. Only record the dedup signature when the backup restore actually succeeded. * fix(docker): restore config parent ownership * fix(pi-runner): flush blocks after compaction retry (openclaw#85288) (thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com> * feat(gateway): forward OpenAI sampling params (openclaw#84094) Forward OpenAI-compatible frequency_penalty, presence_penalty, and seed params through the gateway/chat-completions path while keeping Responses untouched. Verification: - pnpm test src/gateway/openai-http.test.ts src/agents/pi-embedded-runner/extra-params.sampling.test.ts src/agents/openai-transport-stream.test.ts - CI passed on head 9abb946 after rerunning cancelled jobs: preflight, critical quality network-runtime-boundary, security high, checks, docs, Real behavior proof. Co-authored-by: lellansin <lellansin@gmail.com> * fix: correct build errors from cherry-pick merges - Add missing topP field to AgentStreamParams in shared-types.ts - Replace missing resolveAliasedParamValueFromKeys with existing resolveAliasedParamValue in extra-params.ts - Fix onBlockReplyFlush bare variable reference to params.onBlockReplyFlush in attempt.ts * fix: add missing topP to BaseStreamOptions * fix: add missing openai-http helpers (resolveResponseFormat, resolveErrorMessage, validateOpenAiSamplingParams) * fix(tests): revert context-engine test to origin/main state (ported tests need upstream infrastructure) * fix(tests): restore local import paths broken by upstream cherry-picks * fix(tests): fix remaining cherry-pick compat issues for gemmaclaw - openai-http.ts: add FailoverError import, pass streamParams through buildAgentCommandInput, add integer check for seed param, handle FailoverError format reason as 400 in catch block - openai-http.test.ts: add missing FailoverError import - dockerfile.test.ts: use gemmaclaw stage name base-\${OPENCLAW_VARIANT} instead of upstream base-runtime - test/scripts/install-sh.test.ts: remove test file added by cherry-pick 762ae06; it tests apt_get wrapper, npm freshness, macOS Homebrew, and duplicate-install detection from unported companion commits; file was never on gemmaclaw main --------- Co-authored-by: Gio Della-Libera <giodl73@gmail.com> Co-authored-by: Sebastien Tardif <SebTardif@ncf.ca> Co-authored-by: sallyom <somalley@redhat.com> Co-authored-by: Zee Zheng <zheng.zuo0@gmail.com> Co-authored-by: Lellansin Huang <Lellansin@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
…(thanks @spacegeologist) Behavior addressed: Embedded PI compaction retry now drains block replies again after the retry wait resolves, so retry-generated replies are not left behind while preserving aggregate-timeout fallback behavior. Real environment tested: local OpenClaw focused Pi runner test shard plus contributor local live-output proof in the PR body. Exact steps or command run after this patch: pnpm test src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts; .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main Evidence after fix: 2 test files passed, 55 tests passed; final autoreview clean with no accepted/actionable findings. Observed result after fix: the runner flushes before the compaction wait, waits for compaction retry, then performs a second idempotent flush when the wait resolves without timing out. What was not tested: fresh external-channel live retry by this agent; PR retains contributor live-output proof for the delayed channel adapter path. Thanks @spacegeologist. Co-authored-by: zhengzuo0-ai <zheng.zuo0@gmail.com>
Summary
Verification
corepack pnpm exec tsx /private/tmp/openclaw-compaction-retry-delivery-proof.tsnode scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts -t "flushes block replies again after compaction retry wait resolves"pnpm check:architecturenode scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.tsnode scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-payloads.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.splits-long-single-line-fenced-blocks-reopen.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.waits-multiple-compaction-retries-before-resolving.test.tscorepack pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.tsgit diff --checkReal behavior proof
Behavior addressed: Compaction retry can resolve the runner wait before retry-generated block replies finish draining, causing transcript-visible replies to be dropped before channel delivery.
Real environment tested: Local OpenClaw source checkout at commit
70f45ff870e43fbb939c680b74821015ecdccce6, rebased ontoorigin/mainate2f82d4d30bbebbf89bdbb94b3e58c7aa0185151on macOS, with dependencies installed frompnpm-lock.yaml.The log-backed proof uses the real
createBlockReplyPipeline().flush()delivery barrier and the realwaitForCompactionRetryWithAggregateTimeout()helper. The local channel adapter deliberately delays send completion so the retry wait / delivery ordering is visible without external provider credentials.Exact steps or command run after this patch:
corepack pnpm exec tsx /private/tmp/openclaw-compaction-retry-delivery-proof.tsnode scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts -t "flushes block replies again after compaction retry wait resolves"pnpm check:architecturenode scripts/run-vitest.mjs src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts src/agents/pi-embedded-runner/run/compaction-retry-aggregate-timeout.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.tsnode scripts/run-vitest.mjs src/auto-reply/reply/agent-runner-payloads.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.splits-long-single-line-fenced-blocks-reopen.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.waits-multiple-compaction-retries-before-resolving.test.tscorepack pnpm exec oxfmt --check --threads=1 src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.test-support.ts src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.tsgit diff --checkEvidence after fix:
pre-wait-flush-start pre-wait-flush-complete retry-wait-start retry-generated-block-enqueued channel-send-start:retry reply delivered after compaction wait retry-wait-resolved aggregate-wait-result:{"timedOut":false} delivered-before-post-wait-flush:none delivery-in-flight-before-post-wait-flush-completes:yes channel-send-complete:retry reply delivered after compaction wait post-wait-flush-complete:retry reply delivered after compaction wait70f45ff870e43fbb939c680b74821015ecdccce6onorigin/maine2f82d4d30bbebbf89bdbb94b3e58c7aa0185151.Test Files 2 passed (2),Tests 2 passed | 94 skipped (96).Import cycle check: 0 runtime value cycle(s).,Madge import cycle check: 0 cycle(s).,deprecated API usage guard passed,deprecated JSDoc guard passed.Test Files 6 passed (6),Tests 166 passed (166).Test Files 5 passed (5),Tests 62 passed (62).All matched files use the correct format.git diff --checkpassed with no output.Observed result after fix: The retry wait returns with the retry reply still undelivered (
delivered-before-post-wait-flush:none), then the second idempotent flush waits for the real block reply pipeline send chain to finish (post-wait-flush-complete:retry reply delivered after compaction wait). Aggregate-timeout fallback still avoids an extra drain.What was not tested: A live high-context Discord/Telegram/other external-network compaction retry with real provider credentials was not run. The behavior proof is local and log-backed, using the real block reply pipeline flush and retry wait helper with a delayed local channel adapter.
AI Assistance Disclosure
This PR was prepared with AI assistance. I reviewed the changed code path and verified the targeted tests above.