OpenClaw 2026.4.29 (a448042) has a user-visible failure mode where legitimate session-write locks last 40-55s+ during slow prep stages, but at least one call site still uses a hard-coded 10s lock acquire timeout.
Evidence from gateway journal on 2026-05-02 UTC:
[diagnostic] message processed: ... outcome=error duration=33935ms
error="SessionWriteLockTimeoutError: session file locked (timeout 10000ms)"
[session-write-lock] releasing lock held for 55611ms (max=15000ms)
[trace:embedded-run] prep stages: ... totalMs=113809
[diagnostic] liveness warning: eventLoopDelayP99Ms=44090.5
Local investigation:
dist/extensions/codex/run-attempt-DgEHs5eR.js calls acquireSessionWriteLock({ sessionFile: params.sessionFile, timeoutMs: 1e4 }).
dist/docker-C3gkMNg9.js uses timeoutMs: 6e4 for another lock, so 60s is already considered acceptable in at least one path.
transcript-rewrite-D-wdJFRv.js calls acquireSessionWriteLock({ sessionFile: params.sessionFile }), inheriting the helper default.
- Config schema exposes
agents.defaults.maxConcurrent, but I could not find a config key for session write lock acquire timeout. These paths appear hard-coded or defaulted, not configurable.
Impact:
When the gateway is under cron/agent prep load, interactive Telegram messages can fail with “Something went wrong while processing your request” even though the blocking lock is legitimate and releases shortly after.
Request:
- Add a config knob for session write lock acquire timeout, e.g.
session.lockTimeoutMs or agents.defaults.session.lockTimeoutMs.
- Consider raising the default above 10s while v2026.4.29 prep stages are routinely 40-110s.
- Ensure all
acquireSessionWriteLock call sites use the knob instead of hard-coded 1e4.
OpenClaw 2026.4.29 (a448042) has a user-visible failure mode where legitimate session-write locks last 40-55s+ during slow prep stages, but at least one call site still uses a hard-coded 10s lock acquire timeout.
Evidence from gateway journal on 2026-05-02 UTC:
Local investigation:
dist/extensions/codex/run-attempt-DgEHs5eR.jscallsacquireSessionWriteLock({ sessionFile: params.sessionFile, timeoutMs: 1e4 }).dist/docker-C3gkMNg9.jsusestimeoutMs: 6e4for another lock, so 60s is already considered acceptable in at least one path.transcript-rewrite-D-wdJFRv.jscallsacquireSessionWriteLock({ sessionFile: params.sessionFile }), inheriting the helper default.agents.defaults.maxConcurrent, but I could not find a config key for session write lock acquire timeout. These paths appear hard-coded or defaulted, not configurable.Impact:
When the gateway is under cron/agent prep load, interactive Telegram messages can fail with “Something went wrong while processing your request” even though the blocking lock is legitimate and releases shortly after.
Request:
session.lockTimeoutMsoragents.defaults.session.lockTimeoutMs.acquireSessionWriteLockcall sites use the knob instead of hard-coded1e4.