Skip to content

fix: throw resourceNotFound when loadSession fails to resume#363

Merged
benbrandt merged 2 commits intozed-industries:mainfrom
maskedband1t:fix/load-session-resource-not-found
Mar 3, 2026
Merged

fix: throw resourceNotFound when loadSession fails to resume#363
benbrandt merged 2 commits intozed-industries:mainfrom
maskedband1t:fix/load-session-resource-not-found

Conversation

@maskedband1t
Copy link
Contributor

Summary

  • When loadSession calls createSession({ resume: sessionId }) and the CLI subprocess can't find persisted session data (no .jsonl file), the subprocess exits immediately and createSession rejects with a generic -32603 ("Query closed before response received")
  • ACP clients like acpx check for -32001/-32002 (resource not found) to fall back to session/new — the generic -32603 bypasses this, causing the turn to fail entirely
  • This commonly happens when session/new creates an ACP session but the subprocess exits before writing session data (no prompt to process), then session/prompt tries session/load to resume a session that was never persisted

Fix

Wrap the createSession({ resume }) call in loadSession() with try/catch and re-throw as RequestError.resourceNotFound() (code -32002). This lets callers handle it with their existing fallback logic.

Verification

Confirmed working in production with the OpenClaw gateway + acpx integration. Before the fix, all ACP tasks failed silently. After patching both this code path and acpx's shouldFallbackToNewSession() (openclaw/acpx#30), sessions fall back to session/new correctly.

Fixes #338
Closes #361
Closes #362

When `loadSession` calls `createSession` with `{ resume: sessionId }` and
the CLI subprocess can't find persisted session data on disk (no .jsonl
file), the subprocess exits immediately. This causes `createSession` to
reject with a generic -32603 "Query closed before response received" error.

Callers like acpx check for resource-not-found errors (-32001/-32002) to
decide whether to fall back to `session/new`. The generic -32603 bypasses
this check, causing the entire turn to fail instead of gracefully creating
a new session.

This commonly happens when `session/new` creates an ACP session ID but the
CLI subprocess exits before writing any session data (because there was no
prompt to process). The subsequent `session/prompt` spawns a new subprocess
and tries `session/load` to resume — but there's nothing to resume from.

Fix: wrap the `createSession({ resume })` call in try/catch and re-throw as
`RequestError.resourceNotFound()` (code -32002) so callers can handle it
with their existing fallback logic.

Fixes zed-industries#338
Closes zed-industries#361
Closes zed-industries#362
@cla-bot
Copy link

cla-bot bot commented Mar 1, 2026

We require contributors to sign our Contributor License Agreement, and we don't have @maskedband1t on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

@maskedband1t
Copy link
Contributor Author

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Mar 1, 2026
@cla-bot
Copy link

cla-bot bot commented Mar 1, 2026

The cla-bot has been summoned, and re-checked this pull request!

@ArtemisAI
Copy link

Good fix — confirmed this resolves the loadSession() path. One additional code path has the same vulnerability: unstable_resumeSession().

The gap

unstable_resumeSession() calls createSession({ resume: sessionId }) with no error handling, identical to the pre-fix loadSession():

// src/acp-agent.ts — unstable_resumeSession() (current)
async unstable_resumeSession(params: ResumeSessionRequest): Promise<SessionResponse> {
  const response = await this.createSession(   // ← unguarded, same as the pre-fix loadSession
    { cwd: params.cwd, mcpServers: params.mcpServers ?? [], _meta: params._meta },
    { resume: params.sessionId },
  );
  // ...
}

If the .jsonl file is absent, the subprocess dies and createSession rejects with -32603 — the same untyped error that bypasses shouldFallbackToNewSession().

Suggested addition

  async unstable_resumeSession(params: ResumeSessionRequest): Promise<SessionResponse> {
-   const response = await this.createSession(
-     { cwd: params.cwd, mcpServers: params.mcpServers ?? [], _meta: params._meta },
-     { resume: params.sessionId },
-   );
+   let response;
+   try {
+     response = await this.createSession(
+       { cwd: params.cwd, mcpServers: params.mcpServers ?? [], _meta: params._meta },
+       { resume: params.sessionId },
+     );
+   } catch {
+     throw RequestError.resourceNotFound(params.sessionId);
+   }

    setTimeout(() => {
      this.sendAvailableCommandsUpdate(response.sessionId);
    }, 0);
    return response;
  }

Verification

We independently reproduced the issue and confirmed both code paths are reachable. SDK debug logs show:

Spawning Claude Code: ... --resume <session-id> ...
[DEBUG] LSP server manager shut down successfully    ← immediate shutdown
[DEBUG] SessionEnd hook query: other
Claude Code process exited with code 1               ← 1.7s after spawn

The Claude Code subprocess exits before any MCP servers connect, and the error propagates as -32603 regardless of which method initiated the resume.

Patching both methods resolves the issue completely — would it make sense to extend this PR to cover unstable_resumeSession() as well?

@benbrandt
Copy link
Member

Thanks I pushed this into createSession as it happens at the initialize phase if we have a resume param

@benbrandt benbrandt merged commit b4f56b1 into zed-industries:main Mar 3, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants