Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
The commitments store (~/.openclaw/commitments/commitments.json) is updated by markCommitmentsAttempted / markCommitmentsStatus (src/commitments/store.ts:434-461, 463-493) and by upsertInferredCommitments (src/commitments/store.ts:295-331) via an unsynchronized load-modify-save pattern: each call independently reads the entire store, mutates in memory, and writes the whole file back. There is no lockfile, mutex, or writer queue around the critical section.
When two callers race (background heartbeat + user CLI dismiss, or two heartbeat ticks for different sessions, or two Promise.all-style calls within one heartbeat tick), whichever save lands second silently overwrites the first caller's mutation with stale data from before the first save.
By contrast, the session store (src/config/sessions/store.ts:432, 442, 556, 638) wraps every mutation in runExclusiveSessionStoreWrite(storePath, fn) — a per-path async writer queue defined in src/config/sessions/store-writer.ts:14-97 (uses WRITER_QUEUES: Map<string, SessionStoreWriterQueue>). That exact pattern is exported from the session-store and could be reused for commitments. Commitments has no equivalent protection.
Failure modes that manifest under concurrency:
- Lost dismiss — User runs
openclaw commitments dismiss cm_xyz while the heartbeat is bumping attempts on another commitment. Heartbeat's snapshot lacks the dismiss; heartbeat's save wins → cm_xyz reverts to pending and gets delivered to the user.
- Lost "sent" status → duplicate delivery — Heartbeat marks
cm_xyz as sent (src/infra/heartbeat-runner.ts:1929). A concurrent markCommitmentsAttempted (src/infra/heartbeat-runner.ts:1568) on a different commitment overwrites the snapshot → next heartbeat tick sees cm_xyz as still pending and re-sends the same commitment. User-visible: duplicate follow-up message.
- Lost attempts counter — Same race:
attempts: N+1 gets rolled back to N; the retry-budget cap (maxPerDay) is silently extended; the user gets more reminders than the configured policy allows.
Verified live on v2026.5.10-beta.1 (152ea9af34); same source on v2026.5.7 stable.
Steps to reproduce
Deterministic in-process repro using the built dist/commitments/store.js directly. The same race triggers via CLI + heartbeat overlap in practice but the timing window is wider in-process.
cd ~/Gittensor/OpenCoven/openclaw-upstream # any current checkout (HEAD 152ea9af34 verified)
export PATH=~/.nvm/versions/node/v22.22.2/bin:$PATH
pnpm install --prefer-offline # if dist/ isn't already built
pnpm build # if needed
cat > race-repro.mjs <<'EOF'
import {
loadCommitmentStore,
saveCommitmentStore,
markCommitmentsStatus,
resolveCommitmentStorePath,
} from "./dist/commitments/store.js";
function seedStore() {
const now = Date.now();
return {
version: 1,
commitments: ["A", "B"].map((tag) => ({
id: `cm_race${tag}`,
agentId: "qa-race",
sessionKey: "qa-race-session",
channel: "slack",
kind: "follow_up",
sensitivity: "normal",
source: "qa-seed",
status: "pending",
reason: `QA race repro id ${tag}`,
suggestedText: `ping ${tag}`,
dedupeKey: `qa-race-${tag}`,
confidence: 0.9,
dueWindow: { earliestMs: now - 1000, latestMs: now + 3600_000, timezone: "UTC" },
createdAtMs: now,
updatedAtMs: now,
attempts: 0,
})),
};
}
async function snap() {
const s = await loadCommitmentStore();
return Object.fromEntries(s.commitments.map((c) => [c.id, c.status]));
}
let loss = 0, ok = 0;
for (let i = 0; i < 20; i++) {
await saveCommitmentStore(undefined, seedStore());
await Promise.all([
markCommitmentsStatus({ ids: ["cm_raceA"], status: "dismissed", nowMs: Date.now() }),
markCommitmentsStatus({ ids: ["cm_raceB"], status: "dismissed", nowMs: Date.now() }),
]);
const a = await snap();
if (a.cm_raceA === "dismissed" && a.cm_raceB === "dismissed") ok++; else loss++;
console.log(`trial ${i+1}: A=${a.cm_raceA} B=${a.cm_raceB}`);
}
console.log(`\n${loss}/20 trials lost a write`);
EOF
node race-repro.mjs
Result (verified on v2026.5.10-beta.1 / 152ea9af34): 20/20 trials lose one of the two dismisses. One of the commitments reverts to pending every single time.
trial 1: A=pending B=dismissed
trial 2: A=pending B=dismissed
trial 3: A=pending B=dismissed
…
trial 20: A=dismissed B=pending
20/20 trials lost a write
Expected behavior
Both dismisses should succeed; final state should have both cm_raceA and cm_raceB marked dismissed. Equivalent reasoning applies to markCommitmentsAttempted and upsertInferredCommitments: independent mutations to disjoint records should not lose each other.
The session store achieves this with runExclusiveSessionStoreWrite(storePath, fn) (src/config/sessions/store-writer.ts:72) — every load-modify-save is queued per-store-path so a second writer can only start after the first writer's save commits. The commitments store should use the same pattern.
Actual behavior
markCommitmentsStatus and markCommitmentsAttempted perform the following sequence with no lock:
// src/commitments/store.ts:434-461
export async function markCommitmentsAttempted(params: {…}): Promise<void> {
…
const store = await loadCommitmentStore(); // ← read snapshot S0
let changed = false;
store.commitments = store.commitments.map((commitment) => { … }); // ← mutate in memory
if (changed) {
await saveCommitmentStore(undefined, store); // ← write back
}
}
The save path uses privateFileStore(...).writeJson(...) which DOES atomic-rename on disk, so the file is never torn. But atomic-rename only protects against partial reads; it does not protect against last-writer-wins data loss between two read-modify-save cycles whose reads happen before the first cycle's save commits.
Sequence diagram of the bug:
caller A caller B disk
───────── ───────── ─────
loadCommitmentStore() → S0
loadCommitmentStore() → S0
mutate(A=dismissed in S0)→S1_A
mutate(B=dismissed in S0)→S1_B
saveCommitmentStore(S1_A) ───────────────────────────────────────────────────→ {A:dismissed, B:pending}
saveCommitmentStore(S1_B) ─────────→ {A:pending, B:dismissed} ← A's dismiss is gone
20/20 reproduction shows this is deterministic when both markCommitmentsStatus calls share an event-loop tick (typical of Promise.all and concurrent CLI/heartbeat invocations).
OpenClaw version
v2026.5.10-beta.1, v2026.5.7
Operating system
Ubuntu 24.04
Install method
No response
Model
Not applicable
Provider / routing chain
Not applicable
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
The commitments store (
~/.openclaw/commitments/commitments.json) is updated bymarkCommitmentsAttempted/markCommitmentsStatus(src/commitments/store.ts:434-461, 463-493) and byupsertInferredCommitments(src/commitments/store.ts:295-331) via an unsynchronized load-modify-save pattern: each call independently reads the entire store, mutates in memory, and writes the whole file back. There is no lockfile, mutex, or writer queue around the critical section.When two callers race (background heartbeat + user CLI dismiss, or two heartbeat ticks for different sessions, or two
Promise.all-style calls within one heartbeat tick), whichever save lands second silently overwrites the first caller's mutation with stale data from before the first save.By contrast, the session store (
src/config/sessions/store.ts:432, 442, 556, 638) wraps every mutation inrunExclusiveSessionStoreWrite(storePath, fn)— a per-path async writer queue defined insrc/config/sessions/store-writer.ts:14-97(usesWRITER_QUEUES: Map<string, SessionStoreWriterQueue>). That exact pattern is exported from the session-store and could be reused for commitments. Commitments has no equivalent protection.Failure modes that manifest under concurrency:
openclaw commitments dismiss cm_xyzwhile the heartbeat is bumpingattemptson another commitment. Heartbeat's snapshot lacks the dismiss; heartbeat's save wins →cm_xyzreverts topendingand gets delivered to the user.cm_xyzassent(src/infra/heartbeat-runner.ts:1929). A concurrentmarkCommitmentsAttempted(src/infra/heartbeat-runner.ts:1568) on a different commitment overwrites the snapshot → next heartbeat tick seescm_xyzas stillpendingand re-sends the same commitment. User-visible: duplicate follow-up message.attempts: N+1gets rolled back toN; the retry-budget cap (maxPerDay) is silently extended; the user gets more reminders than the configured policy allows.Verified live on
v2026.5.10-beta.1(152ea9af34); same source onv2026.5.7stable.Steps to reproduce
Deterministic in-process repro using the built
dist/commitments/store.jsdirectly. The same race triggers via CLI + heartbeat overlap in practice but the timing window is wider in-process.Result (verified on
v2026.5.10-beta.1/152ea9af34): 20/20 trials lose one of the two dismisses. One of the commitments reverts topendingevery single time.Expected behavior
Both dismisses should succeed; final state should have both
cm_raceAandcm_raceBmarkeddismissed. Equivalent reasoning applies tomarkCommitmentsAttemptedandupsertInferredCommitments: independent mutations to disjoint records should not lose each other.The session store achieves this with
runExclusiveSessionStoreWrite(storePath, fn)(src/config/sessions/store-writer.ts:72) — every load-modify-save is queued per-store-path so a second writer can only start after the first writer's save commits. The commitments store should use the same pattern.Actual behavior
markCommitmentsStatusandmarkCommitmentsAttemptedperform the following sequence with no lock:The save path uses
privateFileStore(...).writeJson(...)which DOES atomic-rename on disk, so the file is never torn. But atomic-rename only protects against partial reads; it does not protect against last-writer-wins data loss between two read-modify-save cycles whose reads happen before the first cycle's save commits.Sequence diagram of the bug:
20/20 reproduction shows this is deterministic when both
markCommitmentsStatuscalls share an event-loop tick (typical ofPromise.alland concurrent CLI/heartbeat invocations).OpenClaw version
v2026.5.10-beta.1, v2026.5.7
Operating system
Ubuntu 24.04
Install method
No response
Model
Not applicable
Provider / routing chain
Not applicable
Additional provider/model setup details
No response
Logs, screenshots, and evidence
Impact and severity
No response
Additional information
No response