Skip to content

Telegram isolated ingress spool can remain blocked by stale .processing claim after gateway recreate #84674

@crash2kx

Description

@crash2kx

Summary

A Telegram isolated polling ingress spool entry can remain stuck as *.json.processing and block all later Telegram updates in the same spool. In my case the stale processing claim survived a gateway recreate and was not recovered automatically. Moving the stuck processing file aside manually allowed the next pending Telegram update to be claimed and processed.

This presents as Telegram showing typing / accepting messages, but no replies being delivered. It can look like general gateway slowness or Codex latency, but the concrete failure was a blocked Telegram spool.

Environment

  • OpenClaw: 2026.5.18
  • Runtime: Docker / docker compose
  • Gateway command: node dist/index.js gateway --bind lan --port 18789
  • Telegram provider: isolated polling ingress enabled
  • Model/provider used by the affected session: openai-codex/gpt-5.5

Observed state

The Telegram spool directory contained:

/home/node/.openclaw/telegram/ingress-spool-default/
  0000000169729588.json.processing
  0000000169729589.json
  0000000169729590.json

The stuck file had a claim from the previous gateway process:

{
  "updateId": 169729588,
  "claim": {
    "processId": "8:a2d7a994-b8c3-4a74-85ab-1c33da3d4490",
    "processPid": 8,
    "claimedAt": 1779301818735
  }
}

That *.json.processing file had been claimed at 2026-05-20T18:30:18Z and remained present long after the turn stopped making progress. Later updates stayed queued as plain .json files.

A gateway recreate did not recover this claim. After recreate, the file was still present as .json.processing, while later updates remained pending.

Manual recovery that worked

I did not delete the file. I copied and renamed it aside:

0000000169729588.json.processing.backup-before-manual-stale-20260520T185702Z
0000000169729588.json.processing.stale-manual-20260520T185702Z

After this, OpenClaw immediately claimed the next pending update:

0000000169729589.json.processing

That turn eventually completed, and the later pending update was processed as well. The affected Telegram topic session returned to done, and no plain pending .json files remained.

Important diagnostic detail

I initially suspected CPU or Docker overhead. A 120s Node CPU profile during the blocked period showed the gateway mostly idle, roughly 93% idle samples. This made the spool state the useful signal: Telegram ingress had accepted messages, but processing was blocked behind the stale .processing file.

A separate run after unblocking did consume CPU and eventually completed, so there seem to be two distinguishable cases:

  1. stale .processing claim blocks the Telegram spool even though gateway is not busy;
  2. once unblocked, a real active turn may still be slow, but it does progress.

Expected behavior

On gateway startup / polling ingress startup, OpenClaw should recover stale Telegram spool claims when the recorded claimant process is no longer valid for the current gateway instance, or when the claim is older than the configured stale timeout.

At minimum, a stale *.json.processing file should not indefinitely block later Telegram updates after a gateway recreate.

Actual behavior

The stale .processing file survived gateway recreate and continued to block later updates until manually moved aside.

Suggested fix

For the isolated Telegram ingress spool recovery path:

  • scan for *.json.processing on provider startup and/or periodically;
  • parse the embedded claim metadata;
  • consider a claim stale if:
    • the processId does not match the current gateway process identity, or
    • the processPid is not the active gateway process for this container, or
    • claimedAt exceeds the configured processing timeout;
  • atomically rename stale processing files to a .processing.stale-<timestamp> or return them to queue if safe;
  • emit a warning with update id, age, claim process id, and chosen recovery action;
  • ensure a stale file cannot block later plain .json updates forever.

This may be related to earlier Telegram spool timeout recovery work, but this specific case is about a *.json.processing claim surviving gateway recreate and not being recovered automatically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High-priority user-facing bug, regression, or broken workflow.clawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions