Skip to content

feat(auto-reply): run-generation fence for stronger interruptibility (refs #70319)#70363

Closed
darconadalabarga wants to merge 3 commits into
openclaw:mainfrom
darconadalabarga:feature/run-interruptibility
Closed

feat(auto-reply): run-generation fence for stronger interruptibility (refs #70319)#70363
darconadalabarga wants to merge 3 commits into
openclaw:mainfrom
darconadalabarga:feature/run-interruptibility

Conversation

@darconadalabarga

@darconadalabarga darconadalabarga commented Apr 22, 2026

Copy link
Copy Markdown

Closes (partially) #70319.

Summary

Introduces a per-session run-generation counter that layers on top of the existing abort primitives (replyRunRegistry, abortEmbeddedPiRun, chat.abort, cancelScope) to provide a unified invalidation signal downstream code can consult before producing side effects.

Goal from the issue: after /stop or a new user message, the superseded run must not produce more output — tool calls, deltas, typing, final replies.

This PR ships the foundation (Pieces A + partial C from the issue's implementation sketch) and wires one real emission site so the fence is live.

What's in this PR

Piece A — Run generation registry (new)

src/auto-reply/reply/run-generation.ts: getCurrentGeneration, incrementGeneration, isCurrentGeneration, forgetGeneration. Global-singleton backed.

Piece A wiring — ReplyOperation carries the generation

  • ReplyOperation.runGeneration captured at createReplyOperation.
  • ReplyOperation.isCurrent() convenience wrapper over isCurrentGeneration(key, runGeneration).
  • abortWithReason in the registry bumps the generation, so any abortByUser / abortForRestart path invalidates the captured value.

Fast-abort integration

tryFastAbortFromMessage in abort.ts also bumps generation up-front so /stop fences late output even when the registry lookup misses (race between end-of-run and the stop message).

Piece C — Stale-output fence (partial, opt-in)

createBlockReplyDeliveryHandler accepts an optional isRunCurrent?: () => boolean parameter. When provided and returns false, the block reply is silently dropped before hitting the channel. Callers pass () => replyOperation.isCurrent().

Live wiring

agent-runner-execution.ts passes the fence callback so post-abort block replies are actually dropped in the real runtime.

Piece D — SIGTERM→SIGKILL escalation

Already covered by src/process/kill-tree.test.ts. No change needed.

What's explicitly deferred

  • Piece B (pre-tool gate): wiring into pi-embedded-runner's tool dispatch. Pattern demonstrated in tests; wiring-only follow-up.
  • Piece E (new-message takeover): dispatch.ts / queue.ts integration. The primitive (incrementGeneration) is in place.
  • More emission sites: typing.ts, reply-delivery for non-block path, followup-delivery.ts. Same opt-in shape.

Tests

src/auto-reply/reply/run-generation.test.ts — 17 tests covering:

  • Registry primitives.
  • ReplyOperation generation wiring (capture at begin, flip on abort, next run captures new gen).
  • Stale-output fence pattern (Tests 2, 3, 5 from the issue).
  • Integration with createBlockReplyDeliveryHandler.
pnpm check:changed  → exit 0
  typecheck core    → ok
  typecheck tests   → ok
  lint core         → ok
  import cycles     → ok
  guards            → ok
  auto-reply suite  → 102 files / 1174 tests passed

pnpm build also passes clean.

Constraints honored

  • No change to gateway protocol.
  • No modification to AGENTS.md / CONTRIBUTING.md / CLAUDE.md.
  • Existing abort behavior unchanged — the new system layers on top.
  • TypeScript ESM, strict types, no any.
  • American English in code/comments.

Happy to break into smaller commits or reshape if the approach lands differently than maintainers prefer.

Introduce a per-session generation counter that complements the existing
AbortController-based cancellation in replyRunRegistry. Each ReplyOperation
now captures the session's current generation at run begin and exposes
isCurrent() so downstream emission and tool-dispatch sites can fence
stale side effects after an abort or new-message takeover.

- src/auto-reply/reply/run-generation.ts: new registry (get/increment/
  isCurrent/forget), global-singleton backed so it survives split chunks.
- src/auto-reply/reply/reply-run-registry.ts: capture generation on
  createReplyOperation; bump generation in abortWithReason so any
  abortByUser / abortForRestart path invalidates the captured value.
- src/auto-reply/reply/abort.ts: bump generation in tryFastAbortFromMessage
  so /stop fences late output even if the registry lookup misses.
- src/auto-reply/reply/run-generation.test.ts: 16 tests covering the
  registry, registry wiring, and the stale-output / pre-tool / typing
  fence patterns described in CLAUDE_CODE_PROMPT.md tests 1, 2, 3, 5.
- src/auto-reply/reply/agent-runner-execution.test.ts: update the
  ReplyOperation mock to include the two new fields.

Piece D (SIGTERM->SIGKILL escalation) is already covered by the existing
src/process/kill-tree.test.ts suite; no change needed.

Pieces B (pre-tool gate), C (emission-site fences) and E (new-message
takeover) are deferred to follow-up commits: each invasive emission
site needs dedicated wiring that deserves its own focused review.

Inspired by patterns from Hermes (NousResearch/hermes-agent).

AI-assisted: yes (Claude Code / Opus 4.7)
Adds an opt-in stale-output fence parameter to createBlockReplyDeliveryHandler
so callers can bail on late block replies from superseded runs. Pass
`() => replyOperation.isCurrent()` to wire the fence to the run-generation
registry introduced in the previous commit.

Pure opt-in — existing callers that don't pass isRunCurrent keep current
behavior. New test covers that the fence drops a post-abort delivery.
Pass `() => replyOperation.isCurrent()` into createBlockReplyDeliveryHandler
so the stale-output fence is now active in the real runtime, not just in
tests. A block reply from a superseded run (after /stop or a new user
message) is silently dropped before hitting the channel.

This is the first site where the generation registry actually changes
runtime behavior. All 1174 auto-reply tests continue to pass.
@clawsweeper

clawsweeper Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Keep this PR open: it contains meaningful interruptibility work, but it is still draft/dirty, lacks real behavior proof, and now needs to be reconciled with current main's adjacent foreground-delivery fence before merge.

Canonical path: Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

So I’m closing this here because the remaining work is already tracked in the canonical issue.

Review details

Best possible solution:

Close this stale PR. The latest review rated it F, the branch still lacks merge-ready proof, and there has been no human follow-up after the durable review.

Do we have a high-confidence way to reproduce the issue?

No. Source inspection shows current main lacks this PR's run-scoped block-reply fence, but it also has a newer foreground delivery fence, so the remaining stale-output gap still needs live reproduction or after-fix proof.

Is this the best way to solve the issue?

Unclear. The run-generation foundation is plausible, but current main already has dispatch-level foreground delivery fencing and this PR only wires block replies, so maintainers should align the contracts or require proof that a separate run-level counter is necessary.

Security review:

Security review cleared: The diff adds in-memory cancellation bookkeeping and tests without changing dependencies, CI, secrets, auth, downloaded artifacts, or package execution paths.

AGENTS.md: found and applied where relevant.

What I checked:

  • stale F-rated PR: PR was opened 2026-04-22T22:14:12Z, is older than 30 days, and the latest review rated it F.
  • proof blocker: real behavior proof is missing and proof tier is F, so this branch is not merge-ready without contributor follow-up.
  • no human follow-up: live comments and timeline hydrated by apply contain no non-automation activity after the ClawSweeper review.

Likely related people:

  • steipete: Recent commit history for reply-run-registry, reply-delivery, agent-runner-execution, and abort includes multiple auto-reply delivery and runner changes by this account. (role: recent area contributor; confidence: high; commits: e71e585969d6, 25c0699fe909, 82e5dd4da74f; files: src/auto-reply/reply/reply-run-registry.ts, src/auto-reply/reply/reply-delivery.ts, src/auto-reply/reply/agent-runner-execution.ts)
  • jalehman: Recent reply/Telegram dispatch work included active abort ownership, queue, and delivery changes in the same interruptibility family. (role: recent adjacent contributor; confidence: medium; commits: 62b51a6295ee; files: src/auto-reply/reply/reply-run-registry.ts, src/auto-reply/reply/abort.ts, src/auto-reply/dispatch.ts)
  • guanbear: Recent Slack DM steering work touched active ReplyOperation routing/thread state, adjacent to the run registry surface extended by this PR. (role: recent adjacent contributor; confidence: medium; commits: f1cb9f2f6a75; files: src/auto-reply/reply/reply-run-registry.ts)

Codex review notes: model gpt-5.5, reasoning high; reviewed against fa614d0907e8.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 22, 2026
@clawsweeper

clawsweeper Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 22, 2026
@openclaw-barnacle

Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 7, 2026
@clawsweeper

clawsweeper Bot commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper applied the proposed close for this PR.

@clawsweeper clawsweeper Bot closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M stale Marked as stale due to inactivity status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant