fix(telegram): recover stale ingress claims after restart#84690
fix(telegram): recover stale ingress claims after restart#84690zerone0x wants to merge 1 commit into
Conversation
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. source inspection gives a high-confidence reproduction path: current main's drain skips recovery when a fresh different-processId claim has the current PID, matching the linked stale processing-file report. I did not run the tests because this review is read-only. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land this narrow Telegram plugin predicate change after redacted real restart-spool proof, while preserving the existing different-PID live-claim protection. Do we have a high-confidence way to reproduce the issue? Yes, source inspection gives a high-confidence reproduction path: current main's drain skips recovery when a fresh different-processId claim has the current PID, matching the linked stale processing-file report. I did not run the tests because this review is read-only. Is this the best way to solve the issue? Yes, the code change is the narrowest maintainable fix because the ownership decision belongs in the Telegram spool predicate rather than core gateway policy. The remaining gap is runtime proof, not a different implementation direction. Label changes:
Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 1a7669bc63a0. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
martingarramon
left a comment
There was a problem hiding this comment.
LGTM-with-concerns. The core fix is correct — adding `claim.claim.processPid !== process.pid` short-circuits the live-owner check for restart-stale claims that recycled the current gateway PID, and the `polling-session.test.ts` change from `claimedAt: now - STALE_MS - 1` to `claimedAt: now` is important: the old test was passing because `isFreshClaimOwner` returned false (stale path), not because it was exercising the PID-reuse case. The new test covers the actual bug path.
Two actionable concerns:
1. Heading format — blocks CI gate.
`## Real behavior proof (required for external PRs)` should be `## Real behavior proof` (exact heading, no suffix). The CI gate matches the exact string; the `(required for external PRs)` suffix is causing the `Real behavior proof` check failure. Content is complete and correct — only the heading needs trimming.
2. Missing symmetric live-owner test.
The `polling-session.test.ts` change converts the existing blocked-case test into a recovery-case test. That's correct, but there's now no test asserting the symmetric path: "different PID + different processId + fresh + processExists → claim remains blocked." The old test was inadvertently covering this (via the stale path). A focused test for the still-blocked case would close the gap.
The other four CI failures (`check-additional-extension-bundled`, `check-lint`, `check-test-types`, `checks-node-core-runtime-infra-state`) are main-churn unrelated to this diff.
Summary
.json.processingclaim from a previous gateway identity when the PID was reused after recreate, blocking later same-lane updates.processPidequals the current process but whose uniqueprocessIddiffers as recoverable stale claims, not another live owner.Motivation
.processingfile can survive gateway recreate and block later updates until manually moved aside.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Real behavior proof (required for external PRs)
.json.processingclaim from an older gateway identity with the current PID no longer blocks later queued updates.node scripts/run-vitest.mjs extensions/telegram/src/telegram-ingress-spool.test.ts extensions/telegram/src/polling-session.test.ts.processingfile and left only the later update pending.Root Cause (if applicable)
processPid === process.pidbutprocessIdbelongs to an older gateway instance.Regression Test Plan (if applicable)
extensions/telegram/src/telegram-ingress-spool.test.ts,extensions/telegram/src/polling-session.test.ts.json.processingclaim with the current PID but an older/different process identity is recoverable and returns the blocked update to the pending spool.User-visible / Behavior Changes
Telegram isolated polling ingress can recover stale
.processingclaims left behind by a recreated gateway when the PID is reused, preventing later updates from staying blocked indefinitely.Diagram (if applicable)
Security Impact (required)
No)No)No)No)No)Yes, explain risk + mitigation: N/ARepro + Verification
Environment
Steps
.json.processingusingprocessPid: process.pidand a differentprocessId.Expected
.json; updates 42 and 43 are pending in order.Actual
.json.processingfile.Evidence
Attach at least one:
Human Verification (required)
What you personally verified (not just CI), and how:
git diff --checkpasses.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes)No)No)Risks and Mitigations