Skip to content

Fix session reset files and ACPX orphan reaping#82459

Merged
joshavant merged 6 commits into
mainfrom
fix/session-reset-topic-file
May 16, 2026
Merged

Fix session reset files and ACPX orphan reaping#82459
joshavant merged 6 commits into
mainfrom
fix/session-reset-topic-file

Conversation

@joshavant

@joshavant joshavant commented May 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Rotate reset transcript files with per-session topic suffixes so reset-model flows preserve prior conversation state in the expected file.
  • Reap plugin-local ACP adapter orphans on ACPX startup after a generated wrapper crash.
  • Keep direct ACP adapter commands out of OpenClaw launch-lease argument injection.

Fixes #82364

Verification

  • env CI=1 OPENCLAW_VITEST_MAX_WORKERS=1 OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=900000 node scripts/run-vitest.mjs extensions/acpx/src/process-reaper.test.ts extensions/acpx/src/runtime.test.ts --run --reporter=verbose
  • git diff --check
  • codex review --uncommitted
  • Crabbox AWS run run_01d90f447321
  • Crabbox AWS run run_0101eefa2e9a for Bug A reset transcript proof
  • Crabbox AWS run run_12fb81ab2664 for Bug B live Codex/auth plus orphan reap proof

Real behavior proof

Behavior addressed: Bug A reset transcript rotation and Bug B ACPX/Codex orphan process cleanup for #82364.
Real environment tested: Raw AWS Crabbox Linux leases with Node v22.22.2 and pnpm 11.1.0 installed directly on the lease; Bug B used a real OPENAI_API_KEY forwarded through Crabbox env-profile redaction.
Exact steps or command run after this patch: Bug A ran the focused gateway reset regression for sessions.reset rotates generated topic transcript files with the new session id. Bug B ran acpx codex exec with OPENAI_API_KEY and ACPX_AUTH_OPENAI_API_KEY, then started the real plugin-local @zed-industries/codex-acp launcher/platform binary, forced the platform adapter into the observed PID 1 orphan shape, and invoked reapStaleOpenClawOwnedAcpxOrphans.
Evidence after fix: Bug A Crabbox run run_0101eefa2e9a passed the reset regression. Bug B Crabbox run run_12fb81ab2664 observed live Codex output OPENCLAW_BUG_B_LIVE_OK, then observed orphan candidate {"pid":3640,"ppid":1,"command":"$PWD/node_modules/@zed-industries/codex-acp-linux-x64/bin/codex-acp"} and reaper result {"inspectedPids":[3490,3640],"terminatedPids":[3490,3640]}.
Observed result after fix: Bug A persisted a reset topic session with a new session id and matching topic transcript filename; Bug B ended with remaining targeted processes after reap: [] and BUG_B_LIVE_REPRO_PROOF_OK.
What was not tested: This did not replay the original Telegram production incident end-to-end; Bug B proves live Codex auth plus the orphaned ACP adapter cleanup path rather than the exact original failed-initialize timing.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime extensions: acpx size: M maintainer Maintainer-authored PR labels May 16, 2026
@clawsweeper

clawsweeper Bot commented May 16, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR rotates generated reset transcript paths, factors shared session-file rotation helpers, expands ACPX orphan reaping for plugin-local ACP adapter processes, narrows launch-lease injection to generated wrappers, and adds focused tests plus a changelog entry.

Reproducibility: yes. by source inspection: current main rotates sessionId while reusing currentEntry.sessionFile, and the resolver contract explains why the stale path persists. I did not run a live current-main gateway reset in this read-only sweep.

Real behavior proof
Needs stronger real behavior proof before merge: The PR body gives ACPX Crabbox log proof, but it does not show after-fix reset transcript rotation in a real gateway setup; contributors should add redacted terminal output, logs, screenshots, or a recording and then update the PR body to trigger re-review.

Next step before merge
This needs contributor reset proof plus maintainer handling for a protected-label PR and two correctness findings; automation should not create a repair PR while the external real-behavior proof gate is still unmet.

Security
Cleared: No concrete supply-chain, secret-handling, workflow-permission, dependency-source, or new third-party code-execution concern was found in the diff.

Review findings

  • [P2] Drop stale generated reset files when rotation misses — src/gateway/session-reset-service.ts:83
  • [P2] Keep normalized session entries with a real sessionId — src/config/sessions/store-entry-shape.ts:50-52
Review details

Best possible solution:

Land a focused branch that rewrites or drops stale generated reset transcript files, preserves true custom sessionFile overrides, keeps persisted session entries aligned with their typed contract or introduces an explicit metadata shape, and includes redacted real reset proof plus focused ACPX/session tests.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection: current main rotates sessionId while reusing currentEntry.sessionFile, and the resolver contract explains why the stale path persists. I did not run a live current-main gateway reset in this read-only sweep.

Is this the best way to solve the issue?

No, not as-is. The reset and ACPX direction is right, but the branch should handle already-stale generated transcript paths, preserve the session-entry shape contract, and add real reset-flow proof before merge.

Full review comments:

  • [P2] Drop stale generated reset files when rotation misses — src/gateway/session-reset-service.ts:83
    If an operator is already in the reported bad state where sessionId changed but sessionFile points at another generated transcript, rewriteSessionFileForNewSessionId returns undefined and this fallback reuses currentEntry.sessionFile again. A post-upgrade sessions.reset can therefore keep showing the unrelated transcript; detect generated transcript names that do not match the current id and derive the new path instead, while preserving true custom overrides.
    Confidence: 0.87
  • [P2] Keep normalized session entries with a real sessionId — src/config/sessions/store-entry-shape.ts:50-52
    This branch keeps a record with a safe-but-non-transcript id by deleting sessionId and returning the remainder as SessionEntry. SessionEntry.sessionId is still required, and removal/archival paths use entry.sessionId when capping or deleting rows, so a metadata-only row can escape validation and later feed an undefined id into cleanup. Use an explicit metadata shape/path or keep pruning rows that cannot satisfy the entry contract.
    Confidence: 0.82

Overall correctness: patch is incorrect
Overall confidence: 0.86

What I checked:

  • Live PR state: The live PR is open at head 453b08f and carries the protected maintainer label, so this cleanup workflow should not close it. (453b08f66965)
  • Current main reset bug: Current main creates a new reset session id and passes the existing currentEntry.sessionFile into resolveSessionFilePath, which can preserve a stale transcript path. (src/gateway/session-reset-service.ts:686, df0d061c7a9c)
  • Resolver contract: resolveSessionFilePath trusts a non-empty explicit sessionFile before deriving the transcript path from sessionId, so reset callers must avoid reusing stale generated paths. (src/config/sessions/paths.ts:267, df0d061c7a9c)
  • PR reset fallback gap: The PR rewrites matching generated filenames, but when the rewrite returns undefined it falls back to currentEntry.sessionFile; already-corrupted entries with a mismatched generated file can therefore stay mismatched after reset. (src/gateway/session-reset-service.ts:83, 453b08f66965)
  • Required session entry id contract: The current SessionEntry type requires sessionId, and many callers treat missing ids as exceptional or skip transcript-backed behavior. (src/config/sessions/types.ts:196, df0d061c7a9c)
  • PR drops session ids from normalized entries: The PR can delete sessionId from a safe-but-non-transcript row and still return the remaining object as SessionEntry, leaving a persisted entry that violates the current type contract. (src/config/sessions/store-entry-shape.ts:50, 453b08f66965)

Likely related people:

  • @vincentkoc: Current-main blame and shortlog across the sampled session validation, gateway reset, and ACPX reaper files point to Vincent Koc commits carrying the relevant current behavior. (role: recent area contributor; confidence: high; commits: 37ba583b05ee, 0eef4c49f650; files: src/config/sessions/store-entry-shape.ts, src/gateway/session-reset-service.ts, extensions/acpx/src/process-reaper.ts)

Remaining risk / open question:

  • The PR body proves the ACPX reaper path only partially and does not yet prove the reset transcript rotation in a real gateway setup.
  • No tests were run in this read-only review; the verdict is based on source, diff, GitHub context, and history inspection.

Codex review notes: model gpt-5.5, reasoning high; reviewed against df0d061c7a9c.

Re-review progress:

@openclaw-barnacle openclaw-barnacle Bot added the app: web-ui App: web-ui label May 16, 2026
@joshavant joshavant force-pushed the fix/session-reset-topic-file branch from bcabb1a to 9dae606 Compare May 16, 2026 06:30
@openclaw-barnacle openclaw-barnacle Bot removed the app: web-ui App: web-ui label May 16, 2026
@joshavant joshavant force-pushed the fix/session-reset-topic-file branch from 2a3560a to 453b08f Compare May 16, 2026 06:55
@joshavant joshavant merged commit 23f73b3 into main May 16, 2026
115 of 120 checks passed
@joshavant joshavant deleted the fix/session-reset-topic-file branch May 16, 2026 07:43
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 20, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
galiniliev pushed a commit to galiniliev/openclaw that referenced this pull request May 25, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
qiaokuan1992 pushed a commit to qiaokuan1992/openclaw that referenced this pull request Jun 2, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
* fix gateway reset transcript rotation

* fix acpx orphan adapter reaping

* add changelog for pr 82459

* fix ci session metadata normalization

* preserve canonical session ids in store reads

* fix session metadata id normalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: acpx gateway Gateway runtime maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2026.5.12: session reset reuses stale sessionFile; codex-acp orphans can leave agents stuck

1 participant