Skip to content

fix: keep ACP prompts alive across gateway reconnects#59473

Merged
obviyus merged 15 commits into
mainfrom
fix/acp-reconnect-prompt-lifecycle
Apr 2, 2026
Merged

fix: keep ACP prompts alive across gateway reconnects#59473
obviyus merged 15 commits into
mainfrom
fix/acp-reconnect-prompt-lifecycle

Conversation

@obviyus

@obviyus obviyus commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • stop treating ACP prompt completion as both a long-lived chat.send request and an event stream; the translator now waits only for send acceptance and keeps completion event-driven
  • keep in-flight ACP prompts alive across transient gateway reconnects
  • lock the regression with a reconnect-focused translator test

Testing

  • pnpm test -- src/acp/translator.stop-reason.test.ts src/acp/translator.prompt-prefix.test.ts src/acp/translator.cancel-scoping.test.ts src/acp/translator.session-rate-limit.test.ts
  • pnpm tsgo

Fixes #59334
Fixes #31686
Fixes #28708

@openclaw-barnacle openclaw-barnacle Bot added size: S maintainer Maintainer-authored PR labels Apr 2, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e1a5433cb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts
Comment thread src/acp/translator.ts Outdated
@greptile-apps

greptile-apps Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a long-standing bug where ACP prompt completion was being treated as both a long-lived chat.send request and an event stream simultaneously. The translator now sends chat.send with { timeoutMs: null } (resolves on gateway acceptance rather than waiting for the final response), and relies entirely on gateway events for completion — matching how the rest of the system processes chat outcomes.

The second part of the fix addresses a previously-flagged issue: in-flight prompts no longer immediately reject on gateway disconnect. Instead, a ACP_GATEWAY_DISCONNECT_GRACE_MS = 5 000 ms timer is armed on disconnect and cleared by handleGatewayReconnect (or when the last pending prompt settles). If the gateway does not reconnect within the window, rejectPendingPrompts fires as a backstop.

Key implementation details that look correct:

  • clearDisconnectTimer is called in every code path that empties pendingPrompts (finishPrompt, cancel, and the sendWithProvenanceFallback error catch), so the timer is always cancelled when there is nothing left to protect.
  • flushPendingErrors in GatewayClient runs before onClose fires, so a chat.send that was never accepted will be rejected synchronously, and its catch handler will clean up the entry and cancel the timer — all before handleGatewayDisconnect sees a non-empty map.
  • New tests cover both the survival path (transient reconnect within the grace window) and the rejection path (grace window expiry with fake timers).

Confidence Score: 5/5

  • Safe to merge — the previously flagged permanent-disconnect hang is addressed with the grace timer, all code paths correctly cancel or reset the timer, and the new tests lock the reconnect behavior.
  • The only open finding is a P2 observation that the 5-second grace window resets on each new disconnect event in a flapping scenario. This does not introduce incorrect behavior in the normal transient-reconnect case and does not block merge. All prior P1 concerns from the previous review thread have been resolved.
  • No files require special attention.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/acp/translator.ts
Line: 440-445

Comment:
**Grace window resets on each additional disconnect**

Each new `handleGatewayDisconnect` call clears the previous timer and restarts the 5-second clock. In a flapping-connection scenario (e.g. the gateway issues rapid close/reconnect cycles where reconnect stalls), the deadline shifts to 5 seconds after the *last* disconnect rather than the *first*, potentially extending the window for an unbounded number of additional intervals.

This is generally fine for the intended transient-reconnect case — a gateway that is legitimately reconnecting will call `handleGatewayReconnect` and clear the timer. If the connection is continuously flapping without ever reconnecting, the prompts will be held longer than intended. Since the gateway has its own exponential backoff, this scenario is unlikely in practice, but it may be worth documenting the intent.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (2): Last reviewed commit: "fix: preserve ACP send timeout semantics" | Re-trigger Greptile

Comment thread src/acp/translator.ts
@obviyus

obviyus commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

@greptile-apps re-review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c76fda51f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9adb1a103c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1520a7b176

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 401c3ccb79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 373d56d037

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated
@obviyus obviyus self-assigned this Apr 2, 2026
@obviyus obviyus force-pushed the fix/acp-reconnect-prompt-lifecycle branch from 373d56d to 6e8b876 Compare April 2, 2026 08:16

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e8b87669b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4379e6c366

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated
@aisle-research-bot

aisle-research-bot Bot commented Apr 2, 2026

Copy link
Copy Markdown

🔒 Aisle Security Analysis

We found 2 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Gateway/agent error status is treated as successful prompt completion (error masking)
2 🟡 Medium Pending prompts can hang indefinitely when created after a gateway disconnect (deadline map not updated)
1. 🟡 Gateway/agent error status is treated as successful prompt completion (error masking)
Property Value
Severity Medium
CWE CWE-703
Location src/acp/translator.ts:1149-1152

Description

In AcpGatewayAgent, gateway-side run failures are mapped to a normal end_turn completion instead of rejecting the in-flight prompt.

This occurs in multiple places:

  • When a chat event arrives with state === "error", the code calls finishPrompt(..., "end_turn"), which resolves the prompt promise.
  • During disconnect reconciliation (agent.wait), if the gateway reports status === "error", the code again calls finishPrompt(..., "end_turn") (sometimes fire-and-forget via void).

Security/consistency impact:

  • Upstream failures (including potential authz/policy enforcement failures, blocked tool calls, or server-side validation errors) can be silently converted into a successful completion from the client’s perspective.
  • This can desynchronize client state (client believes a run completed normally) and can bypass error-handling paths that should stop workflows, alert users, or trigger remediation.

Vulnerable code:

if (result?.status === "error") {
  void this.finishPrompt(sessionId, pending, "end_turn");
  continue;
}

Recommendation

Treat gateway/agent run errors as errors at the ACP layer rather than a normal stop reason.

  • Prefer rejecting the pending prompt (e.g., pending.reject(new Error(result.error ?? "Run failed"))) and clearing active run state.
  • If ACP requires a stop-reason-only API, introduce/extend a stop reason like "server_error" / "failed" and ensure clients surface it as an error.
  • Avoid fire-and-forget (void) for error paths so failures to snapshot/update can be observed and tested.

Example fix (reject on error):

if (result?.status === "error") {
  this.pendingPrompts.delete(sessionId);
  this.sessionStore.clearActiveRun(sessionId);
  pending.reject(new Error(result.error ?? "Gateway run failed"));
  continue;
}

Also apply the same change to the state === "error" chat-event handler and any other reconciliation path (e.g. disconnect-deadline expiry).

2. 🟡 Pending prompts can hang indefinitely when created after a gateway disconnect (deadline map not updated)
Property Value
Severity Medium
CWE CWE-400
Location src/acp/translator.ts:704-755

Description

When the gateway is already disconnected, calling prompt() can leave a PendingPrompt stuck forever:

  • prompt() stores the new pending prompt in this.pendingPrompts.
  • The subsequent chat.send fails with a gateway-close error.
  • The error handler explicitly ignores gateway-close errors (returns early) leaving the pending prompt in the map.
  • The disconnect deadline mechanism (disconnectDeadlineRunIds + 5s timer) is only populated in handleGatewayDisconnect() from prompts that existed at the moment of the disconnect.
  • If a new prompt is created after the disconnect, it is not added to disconnectDeadlineRunIds, so expireDisconnectDeadline() will never reject it. If the gateway never reconnects, the returned promise never resolves/rejects, causing resource leakage and potential DoS (unbounded accumulation of pending prompts / stuck client requests).

Vulnerable code:

void sendWithProvenanceFallback().catch((err) => {
  if (isGatewayCloseError(err) && this.getPendingPrompt(params.sessionId, runId)) {
    return; // leaves pending prompt stored; not added to disconnectDeadlineRunIds
  }
  this.pendingPrompts.delete(params.sessionId);
  ...
});

Recommendation

Ensure prompts created while disconnected are still covered by a rejection deadline.

Options:

  1. Immediately reject new prompts when the gateway is disconnected (simplest):
if (!this.gateway.isConnected()) {
  this.sessionStore.clearActiveRun(params.sessionId);
  throw new Error("Gateway disconnected");
}
  1. Add the newly created pending prompt to the disconnect deadline tracking when a gateway-close error occurs, and ensure a timer exists:
void sendWithProvenanceFallback().catch((err) => {
  if (isGatewayCloseError(err) && this.getPendingPrompt(params.sessionId, runId)) {
    this.disconnectDeadlineRunIds.set(params.sessionId, runId);
    if (!this.disconnectTimer) {
      this.disconnectTimer = setTimeout(() => {
        this.disconnectTimer = null;
        void this.expireDisconnectDeadline(new Error("Gateway disconnected"));
      }, ACP_GATEWAY_DISCONNECT_GRACE_MS);
      this.disconnectTimer.unref?.();
    }
    return;
  }
  ...
});

Also consider clearing/rejecting all pending prompts on a subsequent disconnect event even if already disconnected, to prevent stale state.


Analyzed PR: #59473 at commit 1dc0e0c

Last updated on: 2026-04-02T09:45:34Z

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4691e9e577

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1dc0e0c14a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts Outdated
obviyus added a commit that referenced this pull request Apr 2, 2026
@obviyus obviyus force-pushed the fix/acp-reconnect-prompt-lifecycle branch from baf5ef3 to 2ad774e Compare April 2, 2026 09:03
@obviyus obviyus merged commit 304da2c into main Apr 2, 2026
7 checks passed
@obviyus obviyus deleted the fix/acp-reconnect-prompt-lifecycle branch April 2, 2026 09:04
@obviyus

obviyus commented Apr 2, 2026

Copy link
Copy Markdown
Contributor Author

Landed in 304da2c.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2ad774eed1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/acp/translator.ts
void this.finishPrompt(sessionId, currentPending, "end_turn");
continue;
}
this.rejectPendingPrompts(error, [[sessionId, runId]]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not fail reconnecting prompts while run is still active

When the disconnect grace timer expires, expireDisconnectDeadline rejects any prompt whose agent.wait probe is not terminal, including status: "timeout". A timeout here only means the run is still in progress (the probe uses timeoutMs: 0), so a prompt that successfully resumed after reconnect but takes longer than 5s to finish will be rejected incorrectly. This creates false failures for slow/long responses after transient websocket drops, even though the run can still complete and emit later chat events.

Useful? React with 👍 / 👎.

ngutman pushed a commit that referenced this pull request Apr 3, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (#59473)

* fix: reject superseded ACP pre-ack prompts (#59473)

* style: format ACP reconnect regression updates (#59473)

* style: format ACP reconnect regression updates (#59473)

* fix: guard ACP send acceptance by run id (#59473)

* fix: scope ACP reconnect deadline by prompt (#59473)

* fix: recheck ACP prompts at reconnect deadline (#59473)

* fix: key ACP reconnect deadline by run (#59473)
steipete pushed a commit to duncanita/openclaw that referenced this pull request Apr 4, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (openclaw#59473)

* fix: reject superseded ACP pre-ack prompts (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* fix: guard ACP send acceptance by run id (openclaw#59473)

* fix: scope ACP reconnect deadline by prompt (openclaw#59473)

* fix: recheck ACP prompts at reconnect deadline (openclaw#59473)

* fix: key ACP reconnect deadline by run (openclaw#59473)
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (openclaw#59473)

* fix: reject superseded ACP pre-ack prompts (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* fix: guard ACP send acceptance by run id (openclaw#59473)

* fix: scope ACP reconnect deadline by prompt (openclaw#59473)

* fix: recheck ACP prompts at reconnect deadline (openclaw#59473)

* fix: key ACP reconnect deadline by run (openclaw#59473)
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (openclaw#59473)

* fix: reject superseded ACP pre-ack prompts (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* fix: guard ACP send acceptance by run id (openclaw#59473)

* fix: scope ACP reconnect deadline by prompt (openclaw#59473)

* fix: recheck ACP prompts at reconnect deadline (openclaw#59473)

* fix: key ACP reconnect deadline by run (openclaw#59473)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (openclaw#59473)

* fix: reject superseded ACP pre-ack prompts (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* fix: guard ACP send acceptance by run id (openclaw#59473)

* fix: scope ACP reconnect deadline by prompt (openclaw#59473)

* fix: recheck ACP prompts at reconnect deadline (openclaw#59473)

* fix: key ACP reconnect deadline by run (openclaw#59473)
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
* fix: keep acp prompts alive across gateway reconnects

* fix: bound ACP prompts after disconnect grace

* fix: preserve ACP send timeout semantics

* fix: defer pre-ack ACP disconnect failures

* fix: reconcile ACP runs after reconnect

* fix: keep ACP reconnect deadlines monotonic

* fix: keep pre-ack ACP deadlines after reconnect

* fix: keep ACP prompts alive across gateway reconnects (openclaw#59473)

* fix: reject superseded ACP pre-ack prompts (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* style: format ACP reconnect regression updates (openclaw#59473)

* fix: guard ACP send acceptance by run id (openclaw#59473)

* fix: scope ACP reconnect deadline by prompt (openclaw#59473)

* fix: recheck ACP prompts at reconnect deadline (openclaw#59473)

* fix: key ACP reconnect deadline by run (openclaw#59473)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

maintainer Maintainer-authored PR size: L

Projects

None yet

1 participant