Summary
After the #83505 fix is present, I still observed a Telegram isolated-ingress .json.processing marker remain stuck for a single active topic update when no later same-lane update was queued behind it.
This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.
Related
Environment
- OpenClaw:
2026.5.18
- Local commit:
50a2481652
- Install type: Docker
- Channel: Telegram supergroup forum topics
- Runtime: Codex app-server / embedded agent
- Gateway state at inspection time: running and healthy after the turn eventually cleared
Observed behavior
During Telegram topic testing, a topic message caused prolonged main-thread CPU pressure and delayed health/Telegram behavior. After the system settled, the ingress spool still contained a .json.processing file for the topic update.
Important detail: there was not necessarily a later same-lane update behind that processing marker. The update could therefore remain a lone active handler rather than appearing in drain.blockedByLane.
Source-level concern
Current recovery appears to call timeout recovery using drain.blockedByLane as the candidate set. That catches the important case fixed by #83505, where a stuck handler blocks later same-lane updates.
But a single active stuck handler without a later same-lane update may not be included in blockedByLane, so #recoverTimedOutSpooledHandler(...) may not evaluate it for timeout recovery even after the handler timeout has elapsed.
Suggested narrow fix
Build the timeout candidate set from all active spooled handlers for the same spool, then union in drain.blockedByLane for compatibility:
const timeoutCandidateHandlerKeys = this.#activeSpooledUpdateHandlerKeysForSpool(spoolDir);
for (const handlerKey of drain.blockedByLane) {
timeoutCandidateHandlerKeys.add(handlerKey);
}
const timedOutRecovery = await this.#recoverTimedOutSpooledHandler(timeoutCandidateHandlerKeys);
This preserves same-lane ordering and #83505's tombstone/restart behavior, but also lets a lone active processing claim time out.
Regression coverage idea
Add a polling-session test where:
- A single spooled topic update is claimed and
handleUpdate never settles.
- No later same-lane update exists.
spooledUpdateHandlerTimeoutMs elapses.
- The update is failed into a tombstone and isolated ingress restart is requested.
I prepared a small local patch sketch against extensions/telegram/src/polling-session.ts and extensions/telegram/src/polling-session.test.ts; git diff --check passes. I have not deployed that patch to the running gateway.
Why this matters
Without this edge-case recovery, a lone stuck .json.processing marker can make the account appear mostly recovered while leaving stale spool state behind. On small VPS installs this also correlates with user-visible Telegram delays and event-loop/CPU pressure during the stuck turn.
Summary
After the #83505 fix is present, I still observed a Telegram isolated-ingress
.json.processingmarker remain stuck for a single active topic update when no later same-lane update was queued behind it.This looks like a remaining edge case in the timeout recovery trigger, not a duplicate of the original #83272 failure mode.
Related
b7735f88fa2772b3103ed55eb1294ca4685f122aEnvironment
2026.5.1850a2481652Observed behavior
During Telegram topic testing, a topic message caused prolonged main-thread CPU pressure and delayed health/Telegram behavior. After the system settled, the ingress spool still contained a
.json.processingfile for the topic update.Important detail: there was not necessarily a later same-lane update behind that processing marker. The update could therefore remain a lone active handler rather than appearing in
drain.blockedByLane.Source-level concern
Current recovery appears to call timeout recovery using
drain.blockedByLaneas the candidate set. That catches the important case fixed by #83505, where a stuck handler blocks later same-lane updates.But a single active stuck handler without a later same-lane update may not be included in
blockedByLane, so#recoverTimedOutSpooledHandler(...)may not evaluate it for timeout recovery even after the handler timeout has elapsed.Suggested narrow fix
Build the timeout candidate set from all active spooled handlers for the same spool, then union in
drain.blockedByLanefor compatibility:This preserves same-lane ordering and #83505's tombstone/restart behavior, but also lets a lone active processing claim time out.
Regression coverage idea
Add a polling-session test where:
handleUpdatenever settles.spooledUpdateHandlerTimeoutMselapses.I prepared a small local patch sketch against
extensions/telegram/src/polling-session.tsandextensions/telegram/src/polling-session.test.ts;git diff --checkpasses. I have not deployed that patch to the running gateway.Why this matters
Without this edge-case recovery, a lone stuck
.json.processingmarker can make the account appear mostly recovered while leaving stale spool state behind. On small VPS installs this also correlates with user-visible Telegram delays and event-loop/CPU pressure during the stuck turn.