Skip to content

#990 continuation design-pass β€” bucket-1 orphan-reap + locus-3 durable-mark + tunables (Fork-A)#995

Merged
karmafeast merged 2 commits into
frond-scribe/20260609/assembly-token-wiringfrom
codeagent/990-design-pass
Jun 11, 2026
Merged

#990 continuation design-pass β€” bucket-1 orphan-reap + locus-3 durable-mark + tunables (Fork-A)#995
karmafeast merged 2 commits into
frond-scribe/20260609/assembly-token-wiringfrom
codeagent/990-design-pass

Conversation

@karmafeast

Copy link
Copy Markdown

#990 continuation-storm design-pass (Fork-A: one coherent pass)

Integration PR into the assembly branch (frond-scribe/20260609/assembly-token-wiring), NOT upstream. Clean descendant of the assembly tip 6168d1f3b5 (no conflicts).

Implements the cohort-converged #990 design (🌊 row/detection/reap spec 4681966368 + πŸ•― locus-3 anchor 4678004791), built tests-first by a PRINCE_CODE_AGENTS copilot lane.

What landed (cced2ef724)

  • Ternary liveness classifier (subagent-run-liveness.ts): alive / confident-terminal / uncertain β€” conservative-gate, every racy/uncertain state quiesces.
  • Bucket-1 reap-verdict: delegate-flow-gate FIRST (parentRunId == null β†’ rate-cap-forever, the Regression: continue_work nested in a continuation-delegate subagent does not chain past hop 1 (a179 drift-absorb)Β #952 same-session guard), then only confident-terminal reaps, else rate-cap-forever; read-time JOIN (never persisted). Asymmetric-cost Regression: continue_work nested in a continuation-delegate subagent does not chain past hop 1 (a179 drift-absorb)Β #952 invariant: wrongly culling a busy seat is unrecoverable, parking a zombie is harmless.
  • Locus-3 durable delivered-mark: writes succeeded{optimal,durable} durably before the persist/restart gap + :259 consume read-guard (skips succeeded/cancelled/cancel_requested_at).
  • Tunables β†’ openclaw.json: busySkipBackoff{baseMs,ceilingMs,factor} (give-up rate-cap, never-dropped) + orphanReapStaleCutoffMs (reap confidence-gate). Safety invariants are fixed, not tunable.
  • Cycle-safe lazy dynamic import to keep the agents-registry off the continuation static-import graph (read stays a synchronous in-process Map lookup).

Verification

  • TDD: RED proven by neutralizing impl (6 fail) β†’ GREEN.
  • frond-scribe independent verify (sanctioned run-vitest.mjs @ cced2ef724, don't-trust-terminal): 127 tests / 3 shards / EXIT=0 β€” zod-schema.continuation 41, config 18, work-dispatch 52, subagent-run-liveness 16.
  • Worker full-suite (scripts/test-projects.mjs): all touched shards GREEN; 6 failing shards ALL classified PRE-EXISTING/flaky vs base 6168d1f3b5. tsgo + oxfmt + oxlint clean; config-docs baseline OK. Verdict PASS (no failure attributable to this change).
  • 🌊 seam-owner review = APPROVE (row-structure / detection / reap; test-fired at SHA).

PROOF-GAP (accepted, deferred)

pnpm build + [INEFFECTIVE_DYNAMIC_IMPORT] + import-cycles cannot run in a git worktree (pnpm wants a full node_modules reinstall β€” declined per RELIABLE-TESTING). Defers to Gate-3 prepush-ci on cael/ronan-DGX, the sanctioned heavy-verify env.

Scope-discipline (figs directive)

100% continuation-scoped, ZERO compaction (#946) files β€” git show --name-only | grep -iE compaction = EMPTY. The #945/#946 compaction-reliability work ships on a separate branch/PR-lifecycle, never into this assembly.

Merge gated on

  • πŸ•― locus-3 byte-check (durable-mark-before-persist-gap / :259 skip / restart-gap repro deliver_count==1)
  • πŸͺ¨ discriminator/reap cross-walk vs axis-decomposition

Author note: PR opened via gh CLI auth'd as karmafeast; content-author + driver is frond-scribe.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

scribe-dandelion-cult and others added 2 commits June 11, 2026 08:56
… durable-mark + tunables

Fork-A coherent shape (one PR). Preserves a prince's evacuate→rehydrate
lifecycle meta-cognition across the seam: deliver-until-survives for live
flows, reap only confident-terminal orphans, uncertain→quiesce, no
deliver-then-mark double-delivery.

Ternary + bucket-1 reap-verdict (work-dispatch busy-skip branch):
- classifySubagentRunLiveness (subagent-run-liveness.ts): 3-state
  alive|confident-terminal|uncertain over the latest child-session run.
  No record / within-stale-window β†’ quiesce; explicit endedAt or
  past-cutoff β†’ confident-terminal. Tunable staleCutoffMs floor; per-run
  timeout always respected.
- classifyChildSessionRunLivenessFromRuns: read-time JOIN (never persisted).
- bucket1ReapVerdict: delegate-flow-gate FIRST (parentRunId==null β†’ rate-cap),
  only confident-terminal reaps. Asymmetric cost (#952): never wrongful-reap.
- dispatch reads liveness live in the busy-skip branch; reap via
  markPendingWorkReaped; rate-cap-forever otherwise (Pillar-0 exp-backoff).

locus-3 durable delivered-mark (restart-gap dup cure):
- succeeded {optimal,durable} on the row; markPendingWorkDelivered writes it
  durably the instant a wake is confirmed delivered, before the persist-gap.
- consume read-guard skips a succeeded row even if still running (crash
  between deliver and finishFlow β†’ no re-delivery); peek excludes it too
  (no tight recovery loop). Coupling: location + durable persist both required.

Config tunables (openclaw.json agents.defaults.continuation):
- busySkipBackoff {baseMs,ceilingMs,factor} (rate-cap, default 1s x2 capped at
  maxDelayMs); orphanReapStaleCutoffMs confidence-gate floor. Safety invariants
  stay fixed. Schema + types + resolver + clamps + config-docs baseline.

Refactor: shared finishContinuationWorkFlow helper (turn-granted/superseded/
reaped). recover/dispatch return + gateway log carry reaped.

Tests-first (RED proven by neutralizing impl): bucket-1 matrix + locus-3 in
work-dispatch.test.ts; classifier units; config + zod schema coverage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…toffMs tunables; add worker output.md

- continue-work-signal-v2.md Β§5.1: new tunables in the config surface + operational
  notes (rate-cap semantics, confidence-gate floor, fixed safety invariants).
- output.md: worker handoff (what changed, full-suite tally + base classification,
  proof-gaps, exact commands).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cced2ef724

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with πŸ‘.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +221 to +222
if (state.succeeded) {
continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Finalize delivered rows instead of keeping cleanup blocked

When the gateway crashes after markPendingWorkDelivered writes state.succeeded but before finishFlow, recovery now skips the row here, but the TaskFlow status remains running. The cleanup guards still treat any queued/running continuation work as live (src/auto-reply/continuation/work-store.ts:518), so deleteSubagentSessionForCleanup keeps scheduling retries and the registry sweep keeps skipping that child session forever; exclude delivered-marked rows from the live-work check or finalize them during recovery.

Useful? React with πŸ‘Β / πŸ‘Ž.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants