[Bug]: abortable(activeSession.prompt()) creates zombie Agent loop when signal is pre-aborted

### Bug type

Behavior bug (incorrect output/state without crash)

### Beta release blocker

No

### Summary

When `params.abortSignal` is already aborted before `activeSession.prompt()` is called (e.g. rapid consecutive messages with `messages.queue.mode: "interrupt"`), `abortable()` immediately rejects but the `prompt()` async chain has already started. The floating Promise creates a new `Agent._runLoop()` with a fresh `abortController` that nobody ever aborts, causing the Agent to loop indefinitely calling the LLM after the attempt has exited. Observed: 2617 LLM calls over 103 minutes from a single zombie run.

### Steps to reproduce

1. Configure `messages.queue.mode: "interrupt"` in `openclaw.json`
2. Send a message to the agent
3. Within < 1 second, send a second message (interrupt mode aborts the first run)
4. Observe that the first run's Agent continues calling the LLM in the background after the attempt has exited

Alternatively, run the reproduction test below which uses a pre-aborted `AbortSignal` to simulate the same condition deterministically.

<details>
<summary>agent-zombie-loop.test.ts (click to expand)</summary>

```typescript
import { Agent, type AgentMessage } from "@mariozechner/pi-agent-core";
import type { Api, Message, Model } from "@mariozechner/pi-ai";
import { afterEach, beforeEach, describe, expect, it } from "vitest";
import {
  createDefaultEmbeddedSession,
  getHoisted,
  resetEmbeddedAttemptHarness,
  testModel,
} from "./attempt.spawn-workspace.test-support.js";

const sleep = (ms: number) => new Promise<void>((r) => setTimeout(r, ms));
const mockModel = testModel as unknown as Model<Api>;

const mockTool = {
  name: "mock_tool",
  label: "Mock Tool",
  description: "mock",
  parameters: { type: "object" as const, properties: {} },
  execute: async () => ({ content: [{ type: "text" as const, text: "Aborted" }], details: {} }),
};

function createToolUseStreamFn(tracker: { count: number }) {
  return async (_model: unknown, _context: unknown, options?: { signal?: AbortSignal }) => {
    tracker.count += 1;
    await sleep(5);
    if (options?.signal?.aborted) {
      const err = new Error("Request was aborted.");
      err.name = "AbortError";
      throw err;
    }
    const message = {
      role: "assistant" as const,
      content: [
        { type: "toolCall" as const, id: `call_${tracker.count}`, name: "mock_tool", arguments: {} },
      ],
      usage: { input: 70, output: 51, cacheRead: 0, cacheWrite: 0, totalTokens: 121, cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 } },
      stopReason: "toolUse" as const,
      timestamp: Date.now(),
    };
    return {
      [Symbol.asyncIterator]() {
        let done = false;
        return { async next() { if (!done) { done = true; return { done: false, value: { type: "done", message } }; } return { done: true, value: undefined }; } };
      },
      async result() { return message; },
    } as never;
  };
}

const hoisted = getHoisted();

describe("Agent zombie loop (upstream bug)", () => {
  beforeEach(() => { resetEmbeddedAttemptHarness(); });

  it("bug: abort-before-prompt produces floating Promise, Agent loops after attempt exits", { timeout: 10_000 }, async () => {
    const tracker = { count: 0 };
    const agent = new Agent({
      initialState: { systemPrompt: "test", model: mockModel, tools: [mockTool] },
      streamFn: createToolUseStreamFn(tracker),
      convertToLlm: (msgs: AgentMessage[]): Message[] =>
        msgs.filter((m) => ["user", "assistant", "toolResult"].includes(m.role)) as Message[],
    });

    hoisted.createAgentSessionMock.mockResolvedValue({
      session: createDefaultEmbeddedSession({
        prompt: async (_session, prompt) => {
          agent.prompt(prompt).catch(() => {});
          await sleep(50);
        },
      }),
    });

    const abortSignal = AbortSignal.abort(new Error("second message arrived"));
    const { runEmbeddedAttempt } = await import("./attempt.js");

    await runEmbeddedAttempt({
      sessionId: "zombie-test", sessionKey: "agent:main:main",
      sessionFile: "/tmp/zombie-test.jsonl", workspaceDir: "/tmp", agentDir: "/tmp",
      config: {}, prompt: "first message", timeoutMs: 5_000, runId: "zombie-run",
      provider: "openai", modelId: "gpt-test", model: mockModel,
      authStorage: { getApiKey: async () => undefined } as never,
      modelRegistry: {} as never, thinkLevel: "off",
      senderIsOwner: true, disableMessageTool: true, abortSignal,
    });

    const countAtExit = tracker.count;
    await sleep(500);
    const countAfterWait = tracker.count;

    console.log(`LLM calls at exit=${countAtExit}, after 500ms=${countAfterWait}, delta=${countAfterWait - countAtExit}`);
    expect(countAfterWait).toBeGreaterThan(countAtExit);

    agent.abort();
    agent.clearAllQueues?.();
    await agent.waitForIdle();
  });
});
```

</details>

### Expected behavior

When a run is aborted (via interrupt mode, timeout, or RPC), the Agent should stop all LLM calls promptly. No floating Promises should outlive the attempt lifecycle.

### Actual behavior

The Agent continues calling the LLM indefinitely after the attempt has returned. Each iteration: ~90k input tokens + ~35 output tokens, stopReason always `toolUse`, tools always throw `AbortError` (caught as error result), model retries the same tool call. Loop never terminates unless the process restarts.

### OpenClaw version

All releases since v2026.1.20 (bug introduced in commit `016693a1f` on 2026-01-18)

### Operating system

Linux (also reproducible on macOS)

### Install method

pnpm dev / npm global

### Model

Any model (bug is model-agnostic; the loop is in the Agent runtime, not the LLM)

### Provider / routing chain

Any provider (bug is provider-agnostic)

### Additional provider/model setup details

NOT_ENOUGH_INFO

### Logs, screenshots, and evidence

Production observations across 3 independent cases:

| Case | Trigger | Duration | LLM calls |
|------|---------|----------|-----------|
| 1 | timeout-compaction retry | 76 min | ~2130 |
| 2 | timeout-compaction retry | 2+ hours | ~952 (log truncated) |
| 3 | user rapid messages (652ms apart) | 103 min | 2617 |

Log signature of a zombie run:
- `embedded run prompt end durationMs=<very small, e.g. 22-26ms>` (abortable() rejected immediately)
- Continued `model.usage stopReason=toolUse` lines after `run cleanup` for the same runId
- All tool results are `"Aborted"` (error result)
- `embedded run done` never appears

### Impact and severity

Affected: Any user with `messages.queue.mode: "interrupt"` who sends rapid consecutive messages
Severity: High — silent resource drain, potential large API cost
Frequency: Near-deterministic with interrupt mode + rapid messages; lower probability via timeout-compaction
Consequence: Unbounded LLM API cost, server resource exhaustion, no user-visible indication of the problem

### Additional information

**Root cause**: `await abortable(activeSession.prompt(effectivePrompt))` in `attempt.ts` (introduced in `016693a1f`). JavaScript evaluates `activeSession.prompt()` first (starting the async chain), then `abortable()` races it. When the signal is pre-aborted, `abortable()` rejects immediately but the floating Promise from `prompt()` creates a new `Agent._runLoop()` with a fresh `abortController` that nobody ever aborts.

**Why the inner loop never stops**: `Agent._runLoop()` in pi-agent-core only exits on `stopReason === "error" | "aborted"`. The zombie's signal is never aborted. Tools throw `AbortError` (from the outer `runAbortController.signal`), but this is caught as an error tool result — the model retries indefinitely.

**Why the circuit breaker doesn't fire**: Tool wrapper order is abort-check (outer) → loop-detection (inner). The abort throw short-circuits before the loop detector ever runs.

**Proposed 3-layer fix**:
1. Pre-prompt guard: check `aborted` state before calling `activeSession.prompt()` — eliminates the floating Promise at source
2. `finally` block: call `agent.abort()` + `agent.clearAllQueues()` during attempt cleanup — terminates any escaped Agent
3. Per-run LLM call hard cap: shared counter across attempts, configurable via `agents.defaults.maxLlmCallsPerRun` — ultimate safety net independent of abort signal propagation


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: abortable(activeSession.prompt()) creates zombie Agent loop when signal is pre-aborted #74859

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Case	Trigger	Duration	LLM calls
1	timeout-compaction retry	76 min	~2130
2	timeout-compaction retry	2+ hours	~952 (log truncated)
3	user rapid messages (652ms apart)	103 min	2617

Uh oh!

[Bug]: abortable(activeSession.prompt()) creates zombie Agent loop when signal is pre-aborted #74859

Description

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions