Skip to content

Stronger run interruptibility: unified generation invalidation and stale-output fencing #70319

@darconadalabarga

Description

@darconadalabarga

Problem

When a user sends /stop or a new message while the agent is mid-run (executing tools, streaming, running subprocesses), the current run may continue producing side effects: launching more tools, emitting stale progress updates, and finishing subprocess chains.

This creates a real UX and safety issue: the user loses confidence in their ability to regain control.

Observed behavior

  • /stop cuts some parts of the system but not all
  • Child processes may keep running after chat is aborted
  • A tool batch may continue with subsequent tools after partial cancellation
  • Stale progress/typing/messages keep arriving after stop
  • New user messages may not immediately supersede the active run

Expected behavior

If the user sends a new message or /stop, the previous run should stop producing effects immediately. The new message should become the dominant instruction.

Existing primitives (they're good!)

OpenClaw already has solid abort primitives scattered across subsystems:

  • chat.abort RPC handler
  • abortEmbeddedPiRun() for embedded agent runs
  • clearSessionQueues() for queue cleanup
  • managedRun.cancel("manual-cancel") for exec processes
  • cancel(runId) / cancelScope(scopeKey) in the process supervisor
  • replyRunRegistry.abort() for reply run tracking
  • abortedLastRun flag in session store
  • handlerGeneration invalidation pattern in heartbeat-wake

These are all good building blocks. The issue is not missing cancellation, but missing coordination between them.

What's missing: unified run invalidation

The gap is a single coherent guarantee that:

  1. A new user message or /stop invalidates the active run
  2. The invalidated run cannot produce new side effects (messages, tool calls, progress, typing)
  3. Subprocesses owned by the invalidated run are cancelled
  4. Pending tool calls in the invalidated run are skipped
  5. The new user message becomes the dominant instruction immediately

Proposed approach

Introduce stronger run-scoped interruption semantics, inspired by patterns from Hermes (which implements a well-tested version of this):

1. Run generation counter per session

A simple incrementing counter. When abort or new message arrives, increment generation. All downstream checks validate their captured generation is still current.

2. Pre-tool gate

Before each tool execution, check if the run's generation is still current. If not, skip the tool and return a cancelled result. (Hermes tests this explicitly in test_all_tools_skipped_when_interrupted.)

3. Stale output fence

Prevent stale runs from emitting visible effects. Before emitting streaming deltas, typing indicators, progress updates, or final messages: check generation. The pattern already exists in heartbeat-wake.ts — apply it to the reply pipeline.

4. Stronger subprocess cancellation

Wire exec/supervisor processes to session run scope. On generation change, cancel associated processes.

5. New message takeover

When a new user message arrives during an active run: increment generation → cancel active processes → abort embedded run → clear queues → new message becomes next input.

Prior art

Hermes agent demonstrates these patterns with good test coverage:

  • Thread-scoped interrupt signaling (tools/interrupt.py)
  • Pre-tool interrupt checks with test coverage
  • Gateway run generation invalidation for stale outputs
  • SIGTERM→SIGKILL escalation for resistant processes
  • Pending message queue drain and combination

The goal is not a line-by-line port, but adapting these concepts to OpenClaw's async architecture.

Benefits

  • Safer production behavior (tool chains stop reliably)
  • Stronger user control and trust
  • More predictable /stop semantics
  • Fewer stale messages after abort
  • Foundation for safer autonomous operation

I'm willing to contribute a PR

I have a prototype implementation plan and would be happy to contribute a PR if the maintainers are interested. The implementation is designed to layer on top of existing primitives without breaking current behavior.

This would be AI-assisted (Claude Code) with testing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions