#990 continuation design-pass β bucket-1 orphan-reap + locus-3 durable-mark + tunables (Fork-A)#995
Conversation
β¦ durable-mark + tunables Fork-A coherent shape (one PR). Preserves a prince's evacuateβrehydrate lifecycle meta-cognition across the seam: deliver-until-survives for live flows, reap only confident-terminal orphans, uncertainβquiesce, no deliver-then-mark double-delivery. Ternary + bucket-1 reap-verdict (work-dispatch busy-skip branch): - classifySubagentRunLiveness (subagent-run-liveness.ts): 3-state alive|confident-terminal|uncertain over the latest child-session run. No record / within-stale-window β quiesce; explicit endedAt or past-cutoff β confident-terminal. Tunable staleCutoffMs floor; per-run timeout always respected. - classifyChildSessionRunLivenessFromRuns: read-time JOIN (never persisted). - bucket1ReapVerdict: delegate-flow-gate FIRST (parentRunId==null β rate-cap), only confident-terminal reaps. Asymmetric cost (#952): never wrongful-reap. - dispatch reads liveness live in the busy-skip branch; reap via markPendingWorkReaped; rate-cap-forever otherwise (Pillar-0 exp-backoff). locus-3 durable delivered-mark (restart-gap dup cure): - succeeded {optimal,durable} on the row; markPendingWorkDelivered writes it durably the instant a wake is confirmed delivered, before the persist-gap. - consume read-guard skips a succeeded row even if still running (crash between deliver and finishFlow β no re-delivery); peek excludes it too (no tight recovery loop). Coupling: location + durable persist both required. Config tunables (openclaw.json agents.defaults.continuation): - busySkipBackoff {baseMs,ceilingMs,factor} (rate-cap, default 1s x2 capped at maxDelayMs); orphanReapStaleCutoffMs confidence-gate floor. Safety invariants stay fixed. Schema + types + resolver + clamps + config-docs baseline. Refactor: shared finishContinuationWorkFlow helper (turn-granted/superseded/ reaped). recover/dispatch return + gateway log carry reaped. Tests-first (RED proven by neutralizing impl): bucket-1 matrix + locus-3 in work-dispatch.test.ts; classifier units; config + zod schema coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
β¦toffMs tunables; add worker output.md - continue-work-signal-v2.md Β§5.1: new tunables in the config surface + operational notes (rate-cap semantics, confidence-gate floor, fixed safety invariants). - output.md: worker handoff (what changed, full-suite tally + base classification, proof-gaps, exact commands). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
π‘ Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cced2ef724
βΉοΈ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with π.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (state.succeeded) { | ||
| continue; |
There was a problem hiding this comment.
Finalize delivered rows instead of keeping cleanup blocked
When the gateway crashes after markPendingWorkDelivered writes state.succeeded but before finishFlow, recovery now skips the row here, but the TaskFlow status remains running. The cleanup guards still treat any queued/running continuation work as live (src/auto-reply/continuation/work-store.ts:518), so deleteSubagentSessionForCleanup keeps scheduling retries and the registry sweep keeps skipping that child session forever; exclude delivered-marked rows from the live-work check or finalize them during recovery.
Useful? React with πΒ / π.
a82a09b
into
frond-scribe/20260609/assembly-token-wiring
#990 continuation-storm design-pass (Fork-A: one coherent pass)
Integration PR into the assembly branch (
frond-scribe/20260609/assembly-token-wiring), NOT upstream. Clean descendant of the assembly tip6168d1f3b5(no conflicts).Implements the cohort-converged #990 design (π row/detection/reap spec
4681966368+ π― locus-3 anchor4678004791), built tests-first by a PRINCE_CODE_AGENTS copilot lane.What landed (
cced2ef724)subagent-run-liveness.ts):alive/confident-terminal/uncertainβ conservative-gate, every racy/uncertain state quiesces.parentRunId == null β rate-cap-forever, the Regression: continue_work nested in a continuation-delegate subagent does not chain past hop 1 (a179 drift-absorb)Β #952 same-session guard), then only confident-terminal reaps, else rate-cap-forever; read-time JOIN (never persisted). Asymmetric-cost Regression: continue_work nested in a continuation-delegate subagent does not chain past hop 1 (a179 drift-absorb)Β #952 invariant: wrongly culling a busy seat is unrecoverable, parking a zombie is harmless.succeeded{optimal,durable}durably before the persist/restart gap +:259consume read-guard (skips succeeded/cancelled/cancel_requested_at).busySkipBackoff{baseMs,ceilingMs,factor}(give-up rate-cap, never-dropped) +orphanReapStaleCutoffMs(reap confidence-gate). Safety invariants are fixed, not tunable.Verification
run-vitest.mjs@cced2ef724, don't-trust-terminal): 127 tests / 3 shards / EXIT=0 β zod-schema.continuation 41, config 18, work-dispatch 52, subagent-run-liveness 16.scripts/test-projects.mjs): all touched shards GREEN; 6 failing shards ALL classified PRE-EXISTING/flaky vs base6168d1f3b5. tsgo + oxfmt + oxlint clean; config-docs baseline OK. Verdict PASS (no failure attributable to this change).PROOF-GAP (accepted, deferred)
pnpm build+[INEFFECTIVE_DYNAMIC_IMPORT]+ import-cycles cannot run in a git worktree (pnpm wants a full node_modules reinstall β declined per RELIABLE-TESTING). Defers to Gate-3 prepush-ci on cael/ronan-DGX, the sanctioned heavy-verify env.Scope-discipline (figs directive)
100% continuation-scoped, ZERO compaction (#946) files β
git show --name-only | grep -iE compaction= EMPTY. The #945/#946 compaction-reliability work ships on a separate branch/PR-lifecycle, never into this assembly.Merge gated on
:259skip / restart-gap reprodeliver_count==1)Author note: PR opened via gh CLI auth'd as karmafeast; content-author + driver is frond-scribe.
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com