Skip to content

fix: serialize concurrent appendMessage calls to prevent synthetic flush race#52036

Open
vanek-nutic wants to merge 1 commit intoopenclaw:mainfrom
vanek-nutic:fix/session-tool-result-guard-race
Open

fix: serialize concurrent appendMessage calls to prevent synthetic flush race#52036
vanek-nutic wants to merge 1 commit intoopenclaw:mainfrom
vanek-nutic:fix/session-tool-result-guard-race

Conversation

@vanek-nutic
Copy link
Copy Markdown

fix: serialize concurrent appendMessage calls to prevent synthetic flush race

Problem

OpenClaw's session-tool-result-guard incorrectly generates synthetic error tool results during normal agent operation when multiple async event handlers dispatch appendMessage concurrently. Users experience this as:

  • Spurious [tool call failed — synthetic result] entries in session transcripts
  • Duplicate or corrupted conversation histories (real result appended after synthetic)
  • Agent loops that halt unexpectedly because the guard believes tool calls went unanswered

Affects any scenario where the event bus fires multiple listeners that each call appendMessage without awaiting the previous call — common during agent completion and parallel tool dispatch.

Closes #38432, #13351, #15008.


Root Cause

Two interacting bugs in session-tool-result-guard.ts:

Bug 1: No serialization of concurrent guardedAppend callers

The guard maintains a shared Map<string, string> called pending (tool call ID → tool name). The flush conditions that synthesize error results check pending.size > 0 and fire on any non-toolResult message arriving while IDs are registered:

Tick 1:  Handler A → guardedAppend(assistantMsg with tool_calls)
           → pending.set("call_1", "read_file")
           → originalAppend(assistantMsg)  ← async, yields to event loop

Tick 2:  Handler B → guardedAppend(assistantMsg completion, no tools)
           → pending.size > 0 → flushPendingToolResults() fires 🔥
           → synthetic error written for "call_1"
           → pending.clear()

Tick 3:  Handler A resumes / real toolResult arrives
           → pending.delete("call_1")  ← no-op, already cleared
           → originalAppend(real toolResult) appended AFTER synthetic
           → Transcript: [tool_calls, synthetic_error, real_result] ❌

Because guardedAppend is async-capable (it returns whatever originalAppend returns, which may be a Promise), any awaited I/O inside originalAppend creates a window where a second concurrent caller observes pending.size > 0 and flushes prematurely.

Bug 2: pending.set() called after originalAppend()

Tool call IDs were registered in pending after originalAppend(message) was called. If originalAppend is async and yields, a concurrent toolResult for the same ID could arrive during that window, find the ID absent from pending, skip the delete, and leave the ID permanently registered — triggering a spurious synthetic on the next non-toolResult message.


Fix

2 files changed, +167 / −25 lines.

1. Cooperative async mutex (session-tool-result-guard.ts)

A minimal promise-chain mutex (createMutex()) wraps the entire body of guardedAppend. Each call acquires the lock before executing and releases it on completion (including on error), so concurrent callers execute strictly sequentially in arrival order:

function createMutex() {
  let queue: Promise<void> = Promise.resolve();
  return {
    run<T>(fn: () => T | Promise<T>): Promise<T> {
      const result = queue.then(() => fn());
      queue = result.then(() => undefined, () => undefined);
      return result;
    },
  };
}

The guard body is split into guardedAppendInner (logic) and guardedAppend (mutex wrapper), keeping the public installation API unchanged.

2. Reorder pending.set() to before originalAppend()

Tool call IDs are now registered in pending before originalAppend(message) is called, eliminating the window where a concurrent toolResult could arrive and find a missing ID:

// Register BEFORE appending so a racing toolResult handler finds the IDs
if (toolCalls.length > 0) {
  for (const call of toolCalls) {
    pending.set(call.id, call.name);
  }
}
const result = originalAppend(message as never);

No new npm dependencies. No changes to session-tool-result-guard-wrapper.ts.


Testing

9/9 tests passing (6 existing + 3 new).

New concurrent tests added to session-tool-result-guard.test.ts:

Test What it covers
does not synthesize when toolResult arrives concurrently with next assistant message Exact Tick 1/2/3 race scenario — fires toolResult and a completion message as concurrent microtasks; asserts no synthetic is written and real result is preserved
serializes multiple concurrent appends without duplicating messages Two tool results dispatched concurrently for a two-tool assistant message; asserts exactly 3 messages with no synthetics
getPendingIds() returns empty after concurrent resolution Confirms pending Map is fully drained after concurrent resolution (no leaked IDs that would trigger future spurious flushes)

All three new tests fail on the unpatched code and pass on the patched code, confirming they directly exercise the bug.


Breaking Changes

guardedAppend now always returns Promise<ReturnType<typeof originalAppend>> rather than ReturnType<typeof originalAppend> directly.

This is safe because:

  • Callers of sessionManager.appendMessage are already in async contexts — they either await the result or discard it
  • The TypeScript type for SessionManager["appendMessage"] accepts a Promise return; tsc --noEmit passes clean
  • The public API surface (installSessionToolResultGuard signature) is unchanged

Performance

The mutex adds one Promise.resolve() microtask chain per appendMessage call — approximately 0.01 ms overhead per call in V8.

A typical tool-heavy agent turn (10 tool calls) incurs ~0.12 ms of added latency total. This is well below perception threshold and undetectable in any real workload benchmark.

The mutex does not slow the common path: in the correct (non-racing) case, concurrent callers would have executed sequentially anyway. The mutex prevents the incorrect fast path where two handlers race to mutate pending.

Why not lock-free? JavaScript is single-threaded within a Worker — "concurrency" here is cooperative (at await yield points), not preemptive. A promise-chain mutex is the idiomatic, zero-overhead solution. Lock-free data structures would add complexity with no benefit.


Checklist

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Mar 22, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 135fdf395d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +309 to +312
const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
typeof originalAppend
>;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve synchronous appendMessage behavior

Wrapping guardedAppend in appendMutex.run(...) makes every append asynchronous because queue.then(...) always defers execution to a microtask, even when originalAppend is synchronous. The cast back to ReturnType<typeof originalAppend> hides this runtime change, so callers can still treat appendMessage as sync while seeing stale transcript state (for example, immediate flush/read operations can run before the append body executes). This is a behavioral regression introduced by the wrapper and can reorder or delay guard side effects.

Useful? React with 👍 / 👎.

Comment on lines +282 to +286
// Register tool calls BEFORE appending so that a racing toolResult handler
// (which may arrive during an awaited originalAppend) finds the IDs in pending.
if (toolCalls.length > 0) {
pendingState.trackToolCalls(toolCalls);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Track pending tool calls only after write succeeds

Pending tool-call IDs are now recorded before beforeMessageWriteHook and before originalAppend, so a blocked assistant message (or a failed append) still leaves IDs in pendingState even though no tool-call message was persisted. On the next non-toolResult message, flushPendingToolResults() can emit synthetic errors for tool calls that never existed in the transcript, producing invalid tool-call/result pairing. This regression is caused by moving trackToolCalls ahead of write acceptance.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 22, 2026

Greptile Summary

This PR fixes a real race condition in session-tool-result-guard.ts where concurrent appendMessage calls could prematurely trigger synthetic tool-result generation, corrupting session transcripts with spurious [tool call failed — synthetic result] entries.

Changes:

  • createMutex: A minimal promise-chain mutex serializes all guardedAppend executions in FIFO order. The implementation is idiomatic and correct — failed appends advance the queue via the () => undefined, () => undefined catch handlers so a single error doesn't permanently stall subsequent calls.
  • pending.set() reordering: Tool call IDs are now registered in pending before calling originalAppend, eliminating a secondary window where a concurrent toolResult could arrive and fail to match.
  • 3 new regression tests: All correctly simulate the microtask-level concurrency using Promise.resolve().then(...) without awaiting between dispatches.

Concerns:

  • The as unknown as ReturnType<typeof originalAppend> cast on line 310 hides the runtime return-type change (now always Promise) from TypeScript. No current call site on a guarded session captures the return value as a string, so there's no immediate breakage, but this is a latent footgun for future contributors.
  • flushPendingToolResults and clearPendingToolResults (exposed in the return value) bypass the mutex. With trackToolCalls now running before originalAppend, these can race against an in-flight guarded append more easily than before if called from an async context.

Confidence Score: 4/5

  • Safe to merge — the mutex is correctly implemented, the race is demonstrably fixed, and no existing callers are broken by the return-type change.
  • The core fix (promise-chain mutex + pending.set() reordering) is sound and well-tested. Three new regression tests directly exercise the race scenario and would fail on the unpatched code. The only concerns are a P2 type-safety footgun (as unknown as cast hiding the async return type) and unguarded direct calls to flushPendingToolResults/clearPendingToolResults — neither breaks current behavior. Score is 4 rather than 5 because the type cast silently widens the contract in a way that could surprise future contributors.
  • The as unknown as cast on line 310 of session-tool-result-guard.ts and the unguarded flushPendingToolResults/clearPendingToolResults in the return value deserve a second look.

Comments Outside Diff (1)

  1. src/agents/session-tool-result-guard.ts, line 318-322 (link)

    P2 flushPendingToolResults and clearPendingToolResults bypass the mutex

    The two externally-accessible methods modify pendingState without going through appendMutex. If a caller invokes guard.flushPendingToolResults() while a guardedAppend is in flight (which is now always the case since guardedAppend is async), the flush will observe IDs that were registered by trackToolCalls (which now happens before originalAppend) and synthesize error results even though originalAppend hasn't completed yet.

    This is a pre-existing design choice and not a regression introduced by this PR, but the new ordering of trackToolCalls-before-originalAppend slightly widens the window for a direct flushPendingToolResults() call to race. If this API is called from any async context, consider wrapping these helpers in their own mutex-guarded paths:

    return {
      flushPendingToolResults: () => appendMutex.run(flushPendingToolResults),
      clearPendingToolResults: () => appendMutex.run(clearPendingToolResults),
      getPendingIds: pendingState.getPendingIds,
    };
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agents/session-tool-result-guard.ts
    Line: 318-322
    
    Comment:
    **`flushPendingToolResults` and `clearPendingToolResults` bypass the mutex**
    
    The two externally-accessible methods modify `pendingState` without going through `appendMutex`. If a caller invokes `guard.flushPendingToolResults()` while a `guardedAppend` is in flight (which is now always the case since `guardedAppend` is async), the flush will observe IDs that were registered by `trackToolCalls` (which now happens *before* `originalAppend`) and synthesize error results even though `originalAppend` hasn't completed yet.
    
    This is a pre-existing design choice and not a regression introduced by this PR, but the new ordering of `trackToolCalls`-before-`originalAppend` slightly widens the window for a direct `flushPendingToolResults()` call to race. If this API is called from any async context, consider wrapping these helpers in their own mutex-guarded paths:
    
    ```typescript
    return {
      flushPendingToolResults: () => appendMutex.run(flushPendingToolResults),
      clearPendingToolResults: () => appendMutex.run(clearPendingToolResults),
      getPendingIds: pendingState.getPendingIds,
    };
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/session-tool-result-guard.ts
Line: 309-313

Comment:
**Lies to TypeScript about the return type**

`appendMutex.run(...)` always returns `Promise<T>`, but the `as unknown as ReturnType<typeof originalAppend>` double-cast makes TypeScript believe `guardedAppend` still returns the original synchronous type (typically `string | void`). This bypasses the type system entirely.

In practice no existing caller of a guarded session uses the return value as a string, so there's no immediate breakage. However, this creates a silent footgun: any future code that does `const id = sm.appendMessage(msg)` on a guarded session and then checks `typeof id === "string"` will receive a `Promise` (truthy, but not a string) with no TypeScript warning.

Consider either:
1. Widening `SessionManager["appendMessage"]`'s return type to include `Promise<string | void>` in the upstream type definition, or
2. Explicitly documenting this with a comment explaining why the lie is intentional and safe given all current call sites.

```typescript
  // NOTE: The actual runtime return is Promise<ReturnType<typeof originalAppend>>.
  // The cast is intentional: all callers of appendMessage on a guarded session
  // either await the result or discard it. Verified by `tsc --noEmit` on all
  // call sites in this repo.
  const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
    return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
      typeof originalAppend
    >;
  };
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/session-tool-result-guard.ts
Line: 318-322

Comment:
**`flushPendingToolResults` and `clearPendingToolResults` bypass the mutex**

The two externally-accessible methods modify `pendingState` without going through `appendMutex`. If a caller invokes `guard.flushPendingToolResults()` while a `guardedAppend` is in flight (which is now always the case since `guardedAppend` is async), the flush will observe IDs that were registered by `trackToolCalls` (which now happens *before* `originalAppend`) and synthesize error results even though `originalAppend` hasn't completed yet.

This is a pre-existing design choice and not a regression introduced by this PR, but the new ordering of `trackToolCalls`-before-`originalAppend` slightly widens the window for a direct `flushPendingToolResults()` call to race. If this API is called from any async context, consider wrapping these helpers in their own mutex-guarded paths:

```typescript
return {
  flushPendingToolResults: () => appendMutex.run(flushPendingToolResults),
  clearPendingToolResults: () => appendMutex.run(clearPendingToolResults),
  getPendingIds: pendingState.getPendingIds,
};
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix: add mutex to se..."

Comment on lines +309 to +313
const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
typeof originalAppend
>;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Lies to TypeScript about the return type

appendMutex.run(...) always returns Promise<T>, but the as unknown as ReturnType<typeof originalAppend> double-cast makes TypeScript believe guardedAppend still returns the original synchronous type (typically string | void). This bypasses the type system entirely.

In practice no existing caller of a guarded session uses the return value as a string, so there's no immediate breakage. However, this creates a silent footgun: any future code that does const id = sm.appendMessage(msg) on a guarded session and then checks typeof id === "string" will receive a Promise (truthy, but not a string) with no TypeScript warning.

Consider either:

  1. Widening SessionManager["appendMessage"]'s return type to include Promise<string | void> in the upstream type definition, or
  2. Explicitly documenting this with a comment explaining why the lie is intentional and safe given all current call sites.
  // NOTE: The actual runtime return is Promise<ReturnType<typeof originalAppend>>.
  // The cast is intentional: all callers of appendMessage on a guarded session
  // either await the result or discard it. Verified by `tsc --noEmit` on all
  // call sites in this repo.
  const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
    return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
      typeof originalAppend
    >;
  };
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/session-tool-result-guard.ts
Line: 309-313

Comment:
**Lies to TypeScript about the return type**

`appendMutex.run(...)` always returns `Promise<T>`, but the `as unknown as ReturnType<typeof originalAppend>` double-cast makes TypeScript believe `guardedAppend` still returns the original synchronous type (typically `string | void`). This bypasses the type system entirely.

In practice no existing caller of a guarded session uses the return value as a string, so there's no immediate breakage. However, this creates a silent footgun: any future code that does `const id = sm.appendMessage(msg)` on a guarded session and then checks `typeof id === "string"` will receive a `Promise` (truthy, but not a string) with no TypeScript warning.

Consider either:
1. Widening `SessionManager["appendMessage"]`'s return type to include `Promise<string | void>` in the upstream type definition, or
2. Explicitly documenting this with a comment explaining why the lie is intentional and safe given all current call sites.

```typescript
  // NOTE: The actual runtime return is Promise<ReturnType<typeof originalAppend>>.
  // The cast is intentional: all callers of appendMessage on a guarded session
  // either await the result or discard it. Verified by `tsc --noEmit` on all
  // call sites in this repo.
  const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
    return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
      typeof originalAppend
    >;
  };
```

How can I resolve this? If you propose a fix, please make it concise.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 28, 2026

Codex review: found issues before merge.

Summary
The PR adds a promise-chain mutex around session tool-result guarded appends, moves pending tool-call tracking before persistence, and adds concurrent append regression tests.

Reproducibility: unclear. Current main and the pinned dependency show a synchronous appendMessage path, while the historical missing-tool-result symptom is covered by idle-before-flush tests rather than this PR's proposed async append race.

Next step before merge
Maintainers need to decide whether append serialization is still wanted after the current idle-before-flush mitigation; the submitted branch is not a narrow safe repair as written.

Security
Cleared: The diff only changes agent session guard logic and colocated tests, with no dependency, CI, permission, secret, artifact, package, or publishing surface changes.

Review findings

  • [P1] Preserve synchronous appendMessage behavior — src/agents/session-tool-result-guard.ts:310-312
  • [P1] Track pending tool calls only after write succeeds — src/agents/session-tool-result-guard.ts:282-285
Review details

Best possible solution:

Keep the current idle-before-flush mitigation and only add append serialization if a real async append boundary is proven, while preserving synchronous append semantics and post-acceptance pending tracking.

Do we have a high-confidence way to reproduce the issue?

Unclear. Current main and the pinned dependency show a synchronous appendMessage path, while the historical missing-tool-result symptom is covered by idle-before-flush tests rather than this PR's proposed async append race.

Is this the best way to solve the issue?

No. The submitted mutex is not the best fix as written because it changes the runtime return contract and records pending tool calls before the write path accepts the assistant message.

Full review comments:

  • [P1] Preserve synchronous appendMessage behavior — src/agents/session-tool-result-guard.ts:310-312
    appendMutex.run(...) always returns a Promise and defers the append body to a microtask, while the cast keeps callers typed as receiving the synchronous entry id. Current code emits transcript updates from that id and upstream SessionManager.appendMessage() is a synchronous string return, so guarded sessions can expose stale state or a Promise where an id is expected.
    Confidence: 0.94
  • [P1] Track pending tool calls only after write succeeds — src/agents/session-tool-result-guard.ts:282-285
    Moving pendingState.trackToolCalls(toolCalls) before the before-write hook and originalAppend records ids for assistant messages that may be blocked or fail before persistence. The next non-tool message can then flush synthetic tool results for calls that never entered the transcript.
    Confidence: 0.92

Overall correctness: patch is incorrect
Overall confidence: 0.91

Acceptance criteria:

  • pnpm test src/agents/session-tool-result-guard.test.ts src/agents/pi-embedded-runner.guard.waitforidle-before-flush.test.ts
  • pnpm check:changed

What I checked:

  • Current main preserves synchronous append behavior: The current guard applies the before-write hook, calls originalAppend(finalMessage), emits the transcript update from the returned id, tracks tool calls after append, and returns the original result. (src/agents/session-tool-result-guard.ts:460, 06056926a099)
  • PR makes guarded append asynchronous: The PR wraps the guard body in appendMutex.run(...), which returns a Promise, then casts it back to ReturnType<typeof originalAppend>. (src/agents/session-tool-result-guard.ts:310, 135fdf395d79)
  • PR tracks pending before write acceptance: The PR moves pendingState.trackToolCalls(toolCalls) before applyBeforeWriteHook(...) and originalAppend(...), so blocked or failed assistant writes can leave pending ids. (src/agents/session-tool-result-guard.ts:282, 135fdf395d79)
  • Pinned dependency append contract is synchronous: OpenClaw pins @mariozechner/pi-coding-agent to 0.71.1; its SessionManager.appendMessage(...) declaration returns string, and the published JS uses appendFileSync before returning entry.id. (package.json:1679, 06056926a099)
  • Current main has idle-before-flush mitigation: flushPendingToolResultsAfterIdle(...) waits for agent idle before flushing, and the colocated regression test verifies a real tool result can land before synthetic cleanup. (src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts:43, 06056926a099)
  • Prior review reached same blockers: The existing ClawSweeper review on the PR already flagged the same two P1 issues and noted the exact async append race is not established against the pinned synchronous dependency contract. (135fdf395d79)

Likely related people:

  • rodbland2021: Authored the idle-before-flush mitigation for synthetic missing-tool-result failures and linked the earlier missing-tool-result reports. (role: introduced related mitigation; confidence: high; commits: d3b2135f862f; files: src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts, src/agents/pi-embedded-runner/run/attempt.ts)
  • vignesh07: Recently changed the guard and idle cleanup path to avoid synthetic tool-result writes during timeout cleanup. (role: recent adjacent maintainer; confidence: medium; commits: 4daaea119041; files: src/agents/session-tool-result-guard.ts, src/agents/session-tool-result-guard-wrapper.ts, src/agents/pi-embedded-runner/wait-for-idle-before-flush.ts)
  • parkertoddbrooks: Introduced the synchronous before_message_write hook that is directly relevant to the PR regression where pending ids are tracked before a write can be blocked. (role: adjacent owner; confidence: medium; commits: 15fe87e6b714; files: src/agents/session-tool-result-guard.ts, src/agents/session-tool-result-guard-wrapper.ts, src/plugins/types.ts)
  • steipete: Recently maintained strict-provider transcript repair and session guard behavior around this surface. (role: recent maintainer; confidence: medium; commits: 7f6452897e25, 599ae7fed849; files: src/agents/session-tool-result-guard.ts, src/agents/session-transcript-repair.ts)

Remaining risk / open question:

  • The historical synthetic-result symptom is real, but this PR's exact async appendMessage race is still not proven against the current pinned synchronous dependency contract.
  • The branch is old relative to current main and would need rebase review around newer guard, truncation, and transcript-repair changes.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 06056926a099.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: missing tool result in session history - synthetic error

1 participant