Summary
When multiple Telegram threads/groups send messages concurrently, some messages fail silently due to session store lock contention. The error logged is:
[telegram] handler failed: Error: timeout acquiring session store lock: /home/ubuntu/.clawdbot/agents/main/sessions/sessions.json.lock
Root Cause
The session store uses a single global lock (sessions.json.lock) for all session metadata updates. In src/config/sessions/store.ts:269-337:
async function withSessionStoreLock<T>(
storePath: string,
fn: () => Promise<T>,
opts: SessionStoreLockOptions = {},
): Promise<T> {
const timeoutMs = opts.timeoutMs ?? 10_000; // 10 second timeout
const pollIntervalMs = opts.pollIntervalMs ?? 25;
// ...
When multiple Telegram handlers try to update session metadata simultaneously:
- First handler acquires the lock
- Other handlers queue up, polling every 25ms
- If the first handler takes >10 seconds (common with API calls, tool execution), others timeout
- Timed-out handlers fail silently - messages get no response
Observed Behavior
From production logs:
01:55:29 [telegram] handler failed: Error: timeout acquiring session store lock
01:55:50 [telegram] handler failed: Error: timeout acquiring session store lock
01:58:54 [telegram] handler failed: Error: timeout acquiring session store lock
01:59:06 [telegram] handler failed: Error: timeout acquiring session store lock
01:59:19 [telegram] handler failed: Error: timeout acquiring session store lock
01:59:56 [telegram] handler failed: Error: timeout acquiring session store lock
Some threads work while others don't - it depends on which thread happens to acquire the lock first.
Environment
- Server: AWS Lightsail (3.7GB RAM)
- Multiple Telegram groups/threads active simultaneously
- Gateway running with claude-opus-4-5 model (longer response times)
Suggested Solutions
- Per-session locking: Use individual locks per session ID rather than a global lock
- Lock-free updates: Use atomic file operations or a lightweight database (SQLite with WAL mode)
- Increased timeout with backoff: Longer timeout with exponential backoff (temporary mitigation)
- Queue-based approach: Serialize session updates through a single writer with a queue
Workaround
Restarting the gateway clears the backlog but kills active sessions/subagents, so this is not sustainable.
Impact
- Messages in some Telegram threads get no response
- Users perceive the bot as unreliable
- No user-visible error - messages just disappear
Reported from production server running openclaw gateway
Summary
When multiple Telegram threads/groups send messages concurrently, some messages fail silently due to session store lock contention. The error logged is:
Root Cause
The session store uses a single global lock (
sessions.json.lock) for all session metadata updates. Insrc/config/sessions/store.ts:269-337:When multiple Telegram handlers try to update session metadata simultaneously:
Observed Behavior
From production logs:
Some threads work while others don't - it depends on which thread happens to acquire the lock first.
Environment
Suggested Solutions
Workaround
Restarting the gateway clears the backlog but kills active sessions/subagents, so this is not sustainable.
Impact
Reported from production server running openclaw gateway