Skip to content

Commit 02c7b5b

Browse files
openperfsteipete
andauthored
fix(tasks): reclaim ACP zombie runs blocking gateway restart (#88281)
* fix(tasks): reclaim ACP zombie runs blocking gateway restart (#88205) hasBackingSession treated an ACP task as backed whenever its persisted session-store entry existed, so a crashed mid-turn ACP run left a status=running record that survived the crash and wedged gateway restart/update forever. Gate ACP backing on in-process live-turn liveness instead of entry existence, behind the existing authoritative-process flag (generalized from cron-only) so a standalone maintenance CLI with an empty live-turn map stays conservative and never reclaims. The liveness signal lives in a core-internal active-turns registry (mirroring cron active-jobs) so it stays off the SDK-exported AcpSessionManager surface. It is marked once before the backend loop and cleared when the task is marked terminal, so a slow init or backend failover cleanup cannot let the sweep reclaim a still-live turn. * fix(tasks): preserve cron operator JSON diagnostic reason Split the merged runtime_not_authoritative reason back into the existing cron_runtime_not_authoritative (shipped, consumed by openclaw tasks maintenance --json operator scripts) and a new acp_runtime_not_authoritative for the ACP branch. Strengthen the cron non-authoritative test to lock the reason string contract. * fix(tasks): clear ACP turn liveness on retry failures --------- Co-authored-by: Peter Steinberger <steipete@gmail.com>
1 parent 100dd79 commit 02c7b5b

8 files changed

Lines changed: 614 additions & 251 deletions
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
import { resolveGlobalSingleton } from "../../shared/global-singleton.js";
2+
import { normalizeActorKey } from "./manager.utils.js";
3+
4+
// Process-local liveness signal for in-flight ACP prompt turns, kept off the
5+
// SDK-exported AcpSessionManager so plugins cannot read this maintenance-only
6+
// state. Mirrors cron's active-jobs registry: task maintenance asks "is a turn
7+
// still running for this session?" to avoid reclaiming a live run whose persisted
8+
// session entry survived a crash. The AcpSessionManager marks/clears it in lockstep
9+
// with its in-memory turn map.
10+
11+
type AcpActiveTurnState = {
12+
activeTurnKeys: Set<string>;
13+
};
14+
15+
const ACP_ACTIVE_TURN_STATE_KEY = Symbol.for("openclaw.acp.activeTurns");
16+
17+
function getAcpActiveTurnState(): AcpActiveTurnState {
18+
return resolveGlobalSingleton<AcpActiveTurnState>(ACP_ACTIVE_TURN_STATE_KEY, () => ({
19+
activeTurnKeys: new Set<string>(),
20+
}));
21+
}
22+
23+
export function markAcpTurnActive(sessionKey: string) {
24+
if (!sessionKey) {
25+
return;
26+
}
27+
getAcpActiveTurnState().activeTurnKeys.add(normalizeActorKey(sessionKey));
28+
}
29+
30+
export function clearAcpTurnActive(sessionKey: string) {
31+
if (!sessionKey) {
32+
return;
33+
}
34+
getAcpActiveTurnState().activeTurnKeys.delete(normalizeActorKey(sessionKey));
35+
}
36+
37+
export function isAcpTurnActive(sessionKey: string): boolean {
38+
if (!sessionKey) {
39+
return false;
40+
}
41+
return getAcpActiveTurnState().activeTurnKeys.has(normalizeActorKey(sessionKey));
42+
}
43+
44+
export function resetAcpActiveTurnsForTests() {
45+
getAcpActiveTurnState().activeTurnKeys.clear();
46+
}

0 commit comments

Comments
 (0)