fix: serialize concurrent appendMessage calls to prevent synthetic flush race#52036
fix: serialize concurrent appendMessage calls to prevent synthetic flush race#52036vanek-nutic wants to merge 1 commit intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 135fdf395d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => { | ||
| return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType< | ||
| typeof originalAppend | ||
| >; |
There was a problem hiding this comment.
Preserve synchronous appendMessage behavior
Wrapping guardedAppend in appendMutex.run(...) makes every append asynchronous because queue.then(...) always defers execution to a microtask, even when originalAppend is synchronous. The cast back to ReturnType<typeof originalAppend> hides this runtime change, so callers can still treat appendMessage as sync while seeing stale transcript state (for example, immediate flush/read operations can run before the append body executes). This is a behavioral regression introduced by the wrapper and can reorder or delay guard side effects.
Useful? React with 👍 / 👎.
| // Register tool calls BEFORE appending so that a racing toolResult handler | ||
| // (which may arrive during an awaited originalAppend) finds the IDs in pending. | ||
| if (toolCalls.length > 0) { | ||
| pendingState.trackToolCalls(toolCalls); | ||
| } |
There was a problem hiding this comment.
Track pending tool calls only after write succeeds
Pending tool-call IDs are now recorded before beforeMessageWriteHook and before originalAppend, so a blocked assistant message (or a failed append) still leaves IDs in pendingState even though no tool-call message was persisted. On the next non-toolResult message, flushPendingToolResults() can emit synthetic errors for tool calls that never existed in the transcript, producing invalid tool-call/result pairing. This regression is caused by moving trackToolCalls ahead of write acceptance.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR fixes a real race condition in Changes:
Concerns:
Confidence Score: 4/5
|
| const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => { | ||
| return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType< | ||
| typeof originalAppend | ||
| >; | ||
| }; |
There was a problem hiding this comment.
Lies to TypeScript about the return type
appendMutex.run(...) always returns Promise<T>, but the as unknown as ReturnType<typeof originalAppend> double-cast makes TypeScript believe guardedAppend still returns the original synchronous type (typically string | void). This bypasses the type system entirely.
In practice no existing caller of a guarded session uses the return value as a string, so there's no immediate breakage. However, this creates a silent footgun: any future code that does const id = sm.appendMessage(msg) on a guarded session and then checks typeof id === "string" will receive a Promise (truthy, but not a string) with no TypeScript warning.
Consider either:
- Widening
SessionManager["appendMessage"]'s return type to includePromise<string | void>in the upstream type definition, or - Explicitly documenting this with a comment explaining why the lie is intentional and safe given all current call sites.
// NOTE: The actual runtime return is Promise<ReturnType<typeof originalAppend>>.
// The cast is intentional: all callers of appendMessage on a guarded session
// either await the result or discard it. Verified by `tsc --noEmit` on all
// call sites in this repo.
const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
typeof originalAppend
>;
};Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/session-tool-result-guard.ts
Line: 309-313
Comment:
**Lies to TypeScript about the return type**
`appendMutex.run(...)` always returns `Promise<T>`, but the `as unknown as ReturnType<typeof originalAppend>` double-cast makes TypeScript believe `guardedAppend` still returns the original synchronous type (typically `string | void`). This bypasses the type system entirely.
In practice no existing caller of a guarded session uses the return value as a string, so there's no immediate breakage. However, this creates a silent footgun: any future code that does `const id = sm.appendMessage(msg)` on a guarded session and then checks `typeof id === "string"` will receive a `Promise` (truthy, but not a string) with no TypeScript warning.
Consider either:
1. Widening `SessionManager["appendMessage"]`'s return type to include `Promise<string | void>` in the upstream type definition, or
2. Explicitly documenting this with a comment explaining why the lie is intentional and safe given all current call sites.
```typescript
// NOTE: The actual runtime return is Promise<ReturnType<typeof originalAppend>>.
// The cast is intentional: all callers of appendMessage on a guarded session
// either await the result or discard it. Verified by `tsc --noEmit` on all
// call sites in this repo.
const guardedAppend = (message: AgentMessage): ReturnType<typeof originalAppend> => {
return appendMutex.run(() => guardedAppendInner(message)) as unknown as ReturnType<
typeof originalAppend
>;
};
```
How can I resolve this? If you propose a fix, please make it concise.|
Codex review: found issues before merge. Summary Reproducibility: unclear. Current main and the pinned dependency show a synchronous Next step before merge Security Review findings
Review detailsBest possible solution: Keep the current idle-before-flush mitigation and only add append serialization if a real async append boundary is proven, while preserving synchronous append semantics and post-acceptance pending tracking. Do we have a high-confidence way to reproduce the issue? Unclear. Current main and the pinned dependency show a synchronous Is this the best way to solve the issue? No. The submitted mutex is not the best fix as written because it changes the runtime return contract and records pending tool calls before the write path accepts the assistant message. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 06056926a099. |
fix: serialize concurrent appendMessage calls to prevent synthetic flush race
Problem
OpenClaw's
session-tool-result-guardincorrectly generates synthetic error tool results during normal agent operation when multiple async event handlers dispatchappendMessageconcurrently. Users experience this as:[tool call failed — synthetic result]entries in session transcriptsAffects any scenario where the event bus fires multiple listeners that each call
appendMessagewithout awaiting the previous call — common during agent completion and parallel tool dispatch.Closes #38432, #13351, #15008.
Root Cause
Two interacting bugs in
session-tool-result-guard.ts:Bug 1: No serialization of concurrent
guardedAppendcallersThe guard maintains a shared
Map<string, string>calledpending(tool call ID → tool name). The flush conditions that synthesize error results checkpending.size > 0and fire on any non-toolResultmessage arriving while IDs are registered:Because
guardedAppendisasync-capable (it returns whateveroriginalAppendreturns, which may be a Promise), any awaited I/O insideoriginalAppendcreates a window where a second concurrent caller observespending.size > 0and flushes prematurely.Bug 2:
pending.set()called afteroriginalAppend()Tool call IDs were registered in
pendingafteroriginalAppend(message)was called. IforiginalAppendis async and yields, a concurrenttoolResultfor the same ID could arrive during that window, find the ID absent frompending, skip the delete, and leave the ID permanently registered — triggering a spurious synthetic on the next non-toolResultmessage.Fix
2 files changed, +167 / −25 lines.
1. Cooperative async mutex (
session-tool-result-guard.ts)A minimal promise-chain mutex (
createMutex()) wraps the entire body ofguardedAppend. Each call acquires the lock before executing and releases it on completion (including on error), so concurrent callers execute strictly sequentially in arrival order:The guard body is split into
guardedAppendInner(logic) andguardedAppend(mutex wrapper), keeping the public installation API unchanged.2. Reorder
pending.set()to beforeoriginalAppend()Tool call IDs are now registered in
pendingbeforeoriginalAppend(message)is called, eliminating the window where a concurrenttoolResultcould arrive and find a missing ID:No new npm dependencies. No changes to
session-tool-result-guard-wrapper.ts.Testing
9/9 tests passing (6 existing + 3 new).
New concurrent tests added to
session-tool-result-guard.test.ts:does not synthesize when toolResult arrives concurrently with next assistant messagetoolResultand a completion message as concurrent microtasks; asserts no synthetic is written and real result is preservedserializes multiple concurrent appends without duplicating messagesgetPendingIds() returns empty after concurrent resolutionpendingMap is fully drained after concurrent resolution (no leaked IDs that would trigger future spurious flushes)All three new tests fail on the unpatched code and pass on the patched code, confirming they directly exercise the bug.
Breaking Changes
guardedAppendnow always returnsPromise<ReturnType<typeof originalAppend>>rather thanReturnType<typeof originalAppend>directly.This is safe because:
sessionManager.appendMessageare already in async contexts — they eitherawaitthe result or discard itSessionManager["appendMessage"]accepts a Promise return;tsc --noEmitpasses cleaninstallSessionToolResultGuardsignature) is unchangedPerformance
The mutex adds one
Promise.resolve()microtask chain perappendMessagecall — approximately 0.01 ms overhead per call in V8.A typical tool-heavy agent turn (10 tool calls) incurs ~0.12 ms of added latency total. This is well below perception threshold and undetectable in any real workload benchmark.
The mutex does not slow the common path: in the correct (non-racing) case, concurrent callers would have executed sequentially anyway. The mutex prevents the incorrect fast path where two handlers race to mutate
pending.Checklist
tsc --noEmitpasses cleaninstallSessionToolResultGuardpublic API unchanged