Summary
We hit a production incident on OpenClaw 2026.5.12 where multiple agent sessions appeared stuck in Telegram and the OpenClaw frontend displayed transcripts that did not match the real Telegram topic/DM.
Two separate but compounding problems showed up:
- Session reset rotates
sessionId but can keep the old sessionFile path. The session store ends up with a fresh sessionId pointing at an unrelated/stale transcript file. The frontend then renders the wrong chat history for that session key.
- Failed Codex ACP/acpx launches can leave orphan
codex-acp processes parented to PID 1. OpenClaw had no active ACP tasks, but OS process listing still showed orphan codex-acp processes. These correlated with sluggish/stuck agent behavior and had to be cleaned manually.
This looks related to #82274 / #82343 for Codex delivery stalls and #82318 for recovery state split-brain, but the sessionFile reuse below is a more direct reset-path bug.
Environment
- OpenClaw:
2026.5.12 (f066dd2)
- Codex CLI/app-server:
0.130.0
- Node:
v22.22.0
- OS: Ubuntu Linux
5.15.0-173-generic x86_64
- Gateway: local systemd runtime,
openclaw-gateway active
- Channels affected: Telegram direct and Telegram topic sessions
- Agents affected: multiple configured agents sharing the same gateway
- Install path inspected: npm/global OpenClaw package under
/usr/lib/node_modules/openclaw
Private chat IDs/session keys are intentionally redacted below.
Bug A: reset creates a new sessionId but preserves stale sessionFile
During incident recovery, two agent session entries had this bad shape:
sessionKey=<redacted telegram topic session>
sessionId=<new UUID>
sessionFile=<old UUID>.jsonl
In one concrete case, the sessionFile pointed at a completely unrelated older transcript from a cron/review task. The current frontend session URL for the Telegram topic therefore showed an unrelated transcript instead of the messages in that Telegram topic.
After manually backing up and rewriting those entries so sessionFile matched the new sessionId, the frontend history stopped showing the wrong transcript.
Source-level evidence
In the installed 2026.5.12 bundle, performGatewaySessionReset generates nextSessionId, but then passes the current entry's existing sessionFile back into resolveSessionFilePath:
oldSessionId = currentEntry?.sessionId;
oldSessionFile = currentEntry?.sessionFile;
const now = Date.now();
const nextSessionId = randomUUID();
const nextEntry = {
sessionId: nextSessionId,
sessionFile: resolveSessionFilePath(nextSessionId, currentEntry?.sessionFile ? { sessionFile: currentEntry.sessionFile } : void 0, resolveSessionFilePathOptions({
storePath,
agentId: sessionAgentId
})),
updatedAt: now,
systemSent: false,
abortedLastRun: false,
resolveSessionFilePath trusts any provided entry.sessionFile candidate before deriving a path from sessionId:
function resolveSessionFilePath(sessionId, entry, opts) {
const sessionsDir = resolveSessionsDir(opts);
const candidate = entry?.sessionFile?.trim();
if (candidate) try {
return resolvePathWithinSessionsDir(sessionsDir, candidate, { agentId: opts?.agentId });
} catch {}
return resolveSessionTranscriptPathInDir(sessionId, sessionsDir);
}
So on reset, nextSessionId changes but sessionFile can remain the previous transcript path.
There is already a helper elsewhere that seems intended to solve exactly this class of problem:
function rewriteSessionFileForNewSessionId(params) {
const trimmed = normalizeOptionalString(params.sessionFile);
if (!trimmed) return;
const base = path.basename(trimmed);
if (!base.endsWith(".jsonl")) return;
const withoutExt = base.slice(0, -6);
if (withoutExt === params.previousSessionId) return path.join(path.dirname(trimmed), `${params.nextSessionId}.jsonl`);
if (withoutExt.startsWith(`${params.previousSessionId}-topic-`)) return path.join(path.dirname(trimmed), `${params.nextSessionId}${base.slice(params.previousSessionId.length)}`);
}
But the reset service path above does not appear to use it.
Expected behavior
When a session reset rotates from oldSessionId to nextSessionId, the persisted sessionFile should also be rewritten to the matching transcript path, preserving topic suffixes such as -topic-<id> when present.
The store should not persist:
sessionId=<new UUID>
sessionFile=<old UUID>.jsonl
unless this is an intentional fork/checkpoint reference and is marked as such.
Bug B: codex-acp orphan processes after failed initialize
The same incident also involved Codex ACP/acpx failures before initialize. Task output showed ACP failed before initialize and direct acpx@0.6.1 fallback failed the same way.
After that, OS process listing showed codex-acp processes with PPID=1, while OpenClaw reported no active ACP tasks. They were invisible to normal task accounting and had to be killed manually.
Sanitized shape of the evidence:
openclaw tasks --runtime acp --json -> active=[]
ps -eo pid,ppid,etime,cmd | grep codex-acp -> codex-acp rows with PPID=1
Impact observed locally:
- Telegram sessions stayed in
typing... / no final response.
- Gateway/message actions became very slow.
- Manual cleanup of orphan
codex-acp processes plus session store repair restored normal behavior.
Expected behavior
If ACP/acpx fails before initialize, OpenClaw should guarantee child process cleanup or track and reap failed launch descendants.
At minimum:
- failed ACP initialize should not leave
codex-acp under PID 1;
openclaw tasks --runtime acp or doctor should surface orphan ACP descendants;
- gateway restart/doctor should provide a safe reap/repair path.
Why this matters
This failure is very visible to users:
- Telegram says the bot is typing forever.
- The OpenClaw frontend shows a different transcript than the actual Telegram conversation.
- Recovery is confusing because the session entry can look reset while still rendering stale history.
- Orphan ACP processes are outside normal OpenClaw task visibility, so operators can miss the real reason the host feels stuck.
Related issues
Suggested fix direction
For the session reset path:
- Use the existing
rewriteSessionFileForNewSessionId behavior in performGatewaySessionReset, or equivalent logic.
- Add a regression test that starts with
sessionId=old, sessionFile=old-topic-123.jsonl, reset produces sessionId=new, and expected sessionFile=new-topic-123.jsonl.
- Add a store integrity check: flag entries where basename UUID does not match
entry.sessionId unless explicitly marked as checkpoint/fork/legacy.
For ACP orphan cleanup:
- Make failed pre-initialize launches kill the whole process group.
- Track ACP child PIDs early enough that pre-initialize failures are still accounted for.
- Add
doctor detection for codex-acp / acpx descendants with PPID=1 or no matching OpenClaw task.
- Consider logging a clear warning when ACP process cleanup fails.
Local mitigation applied
We mitigated locally by:
- Backing up the affected
sessions.json stores.
- Rewriting mismatched
sessionFile entries to match their sessionId.
- Clearing stale
status=running entries for sessions whose run had already died.
- Killing only orphan
codex-acp processes with PPID=1.
- Revalidating that mismatched session entries were gone and no
codex-acp orphan remained.
No secrets or private message content are needed to reproduce the source-level session reset bug; the snippets above should be enough to locate it.
Summary
We hit a production incident on OpenClaw
2026.5.12where multiple agent sessions appeared stuck in Telegram and the OpenClaw frontend displayed transcripts that did not match the real Telegram topic/DM.Two separate but compounding problems showed up:
sessionIdbut can keep the oldsessionFilepath. The session store ends up with a freshsessionIdpointing at an unrelated/stale transcript file. The frontend then renders the wrong chat history for that session key.codex-acpprocesses parented to PID 1. OpenClaw had no active ACP tasks, but OS process listing still showed orphancodex-acpprocesses. These correlated with sluggish/stuck agent behavior and had to be cleaned manually.This looks related to #82274 / #82343 for Codex delivery stalls and #82318 for recovery state split-brain, but the
sessionFilereuse below is a more direct reset-path bug.Environment
2026.5.12 (f066dd2)0.130.0v22.22.05.15.0-173-genericx86_64openclaw-gatewayactive/usr/lib/node_modules/openclawPrivate chat IDs/session keys are intentionally redacted below.
Bug A: reset creates a new sessionId but preserves stale sessionFile
During incident recovery, two agent session entries had this bad shape:
In one concrete case, the
sessionFilepointed at a completely unrelated older transcript from a cron/review task. The current frontend session URL for the Telegram topic therefore showed an unrelated transcript instead of the messages in that Telegram topic.After manually backing up and rewriting those entries so
sessionFilematched the newsessionId, the frontend history stopped showing the wrong transcript.Source-level evidence
In the installed 2026.5.12 bundle,
performGatewaySessionResetgeneratesnextSessionId, but then passes the current entry's existingsessionFileback intoresolveSessionFilePath:resolveSessionFilePathtrusts any providedentry.sessionFilecandidate before deriving a path fromsessionId:So on reset,
nextSessionIdchanges butsessionFilecan remain the previous transcript path.There is already a helper elsewhere that seems intended to solve exactly this class of problem:
But the reset service path above does not appear to use it.
Expected behavior
When a session reset rotates from
oldSessionIdtonextSessionId, the persistedsessionFileshould also be rewritten to the matching transcript path, preserving topic suffixes such as-topic-<id>when present.The store should not persist:
unless this is an intentional fork/checkpoint reference and is marked as such.
Bug B: codex-acp orphan processes after failed initialize
The same incident also involved Codex ACP/acpx failures before initialize. Task output showed ACP failed before initialize and direct
acpx@0.6.1fallback failed the same way.After that, OS process listing showed
codex-acpprocesses withPPID=1, while OpenClaw reported no active ACP tasks. They were invisible to normal task accounting and had to be killed manually.Sanitized shape of the evidence:
Impact observed locally:
typing.../ no final response.codex-acpprocesses plus session store repair restored normal behavior.Expected behavior
If ACP/acpx fails before initialize, OpenClaw should guarantee child process cleanup or track and reap failed launch descendants.
At minimum:
codex-acpunder PID 1;openclaw tasks --runtime acpordoctorshould surface orphan ACP descendants;Why this matters
This failure is very visible to users:
Related issues
Suggested fix direction
For the session reset path:
rewriteSessionFileForNewSessionIdbehavior inperformGatewaySessionReset, or equivalent logic.sessionId=old,sessionFile=old-topic-123.jsonl, reset producessessionId=new, and expectedsessionFile=new-topic-123.jsonl.entry.sessionIdunless explicitly marked as checkpoint/fork/legacy.For ACP orphan cleanup:
doctordetection forcodex-acp/acpxdescendants withPPID=1or no matching OpenClaw task.Local mitigation applied
We mitigated locally by:
sessions.jsonstores.sessionFileentries to match theirsessionId.status=runningentries for sessions whose run had already died.codex-acpprocesses withPPID=1.codex-acporphan remained.No secrets or private message content are needed to reproduce the source-level session reset bug; the snippets above should be enough to locate it.