Skip to content

fix(gateway): preserve stale channel restart diagnostics#90937

Merged
clawsweeper[bot] merged 3 commits into
openclaw:mainfrom
snowzlm:fix/90901-telegram-restart-pending
Jun 8, 2026
Merged

fix(gateway): preserve stale channel restart diagnostics#90937
clawsweeper[bot] merged 3 commits into
openclaw:mainfrom
snowzlm:fix/90901-telegram-restart-pending

Conversation

@snowzlm

@snowzlm snowzlm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #90901.

This keeps gateway-owned lifecycle state authoritative after a channel task has been aborted. A stale Telegram/polling-style task can still emit a late status patch after a recovery stop timeout. Before this change, that late patch could re-mark the account as connected=true and clear lastError while the manager still had running=false, restartPending=true, and reconnectAttempts=0.

The fix:

  • sanitizes status patches emitted by aborted channel tasks so they cannot overwrite gateway-owned lifecycle fields;
  • prevents late stale-task patches from re-marking a stopped runtime as connected;
  • preserves actionable lifecycle diagnostics such as channel stop timed out after 5000ms instead of allowing lastError=null;
  • marks stopped runtimes as disconnected when the runtime previously had a connection state.

Current branch state

The PR branch was refreshed on top of the latest openclaw/openclaw@main after the earlier CI failure in checks-node-agentic-agents-core-tools. ClawSweeper then ran its automerge repair lane and kept the fix on the contributor branch.

  • Base used for the refreshed branch: b8adc11977ab9dc1eb558dc070bfe63df75911c5 (feat(cron): support command jobs)
  • Current head: 53b37e50739078238b0b582c4db5da61ddd66cb8
  • Current branch delta is limited to the gateway lifecycle fix and its regression tests:
    • src/gateway/server-channels.ts
    • src/gateway/server-channels.test.ts

The earlier failing CI shard was reproduced against the refreshed branch and now passes:

OPENCLAW_VITEST_INCLUDE_FILE=/tmp/pr90937-agentic-agents-core-tools-includes.json \
OPENCLAW_VITEST_SHARD_NAME=agentic-agents-core-tools \
OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=300000 \
OPENCLAW_TEST_PROJECTS_PARALLEL=2 \
NODE_OPTIONS=--max-old-space-size=8192 \
pnpm exec node scripts/test-projects.mjs test/vitest/vitest.agents-core.config.ts

# 64 test files passed
# 1125 tests passed, 4 skipped

That confirms the earlier camera inline-image failures were from the old PR branch being behind the current mainline test/runtime baseline, not from this gateway lifecycle change.

Review feedback addressed

ClawSweeper's latest review did not identify a code defect. The remaining P1 blocker was proof quality:

  • It requested redacted real-runtime proof from Telegram polling, LaunchAgent gateway recovery, or a comparable non-mock channel lifecycle run.
  • It marked the patch quality strong, but capped the PR at proof-limited readiness because Vitest/CI output alone was considered supplemental.
  • It paused automerge for human review because the change affects the gateway lifecycle path used by all channel plugins.

This update adds a redacted real gateway runtime diagnostic-channel run below. It uses the actual patched createChannelManager runtime path, real timers, a real channel plugin registered through the plugin registry, and no Vitest/fake-timer harness. It exercises the stale aborted task sequence after the 5000ms stop-timeout path.

Reproduction / before evidence

Added regression coverage for the reported shape:

  1. start a managed account task;
  2. abort it through a non-manual recovery stop;
  3. let the stop timeout leave recovery restart pending;
  4. request a restart while the stale task is still present;
  5. have the stale task emit a late connected=true / lastError=null status patch.

Before the fix, the new regression fails because the stale task revives connected=true in the stopped runtime:

  • GitHub Actions before-fix run: https://github.com/snowzlm/openclaw/actions/runs/27063237738
  • Before-fix commit: 6739bbf06eda1740878cfcc2097f39609d18de69
  • Failure: src/gateway/server-channels.test.ts > keeps recovery timeout diagnostics when a stale task reports connected after abort
  • Observed assertion: expected true to be false

After evidence

With the fix, the same regression passes and the runtime keeps the actionable timeout diagnostic instead of returning to connected=true / lastError=null.

Focused verification on the refreshed latest-main branch:

node scripts/run-vitest.mjs run \
  --config test/vitest/vitest.gateway.config.ts \
  src/gateway/server-channels.test.ts \
  -t "keeps recovery timeout diagnostics"
# 1 passed

node scripts/run-vitest.mjs run \
  --config test/vitest/vitest.gateway.config.ts \
  src/gateway/server-channels.test.ts \
  src/gateway/channel-health-monitor.test.ts \
  src/gateway/server-methods/channels.status.test.ts
# 6 files passed, 155 tests passed

node scripts/run-tsgo.mjs \
  -p test/tsconfig/tsconfig.core.test.json \
  --incremental \
  --tsBuildInfoFile .artifacts/tsgo-cache/core-test-pr90937-latest.tsbuildinfo
# passed

./node_modules/.bin/oxfmt --check --threads=1 \
  src/gateway/server-channels.ts \
  src/gateway/server-channels.test.ts
# passed

Cloud proof after refreshing the branch:

Cloud CI for the current head is green after ClawSweeper refreshed the contributor branch. gh pr checks 90937 --repo openclaw/openclaw exits 0, including checks-node-agentic-agents-core-tools, gateway shards, CodeQL security-high shards, OpenGrep, build artifacts, and Real behavior proof.

Real behavior proof

  • Behavior or issue addressed: after a recovery stop timeout, a stale aborted channel task must not re-mark the runtime as connected or clear the actionable timeout error while restart remains pending.
  • Real environment tested: Linux source checkout, Node.js v24, patched OpenClaw gateway lifecycle manager running from current head 53b37e50739078238b0b582c4db5da61ddd66cb8.
  • Exact steps or command run after this patch: ran a standalone real gateway runtime diagnostic-channel script against the patched source. The script registers a real local channel plugin through the plugin registry, starts it through createChannelManager, performs a non-manual stop that reaches the real 5000ms recovery timeout, requests restart while the stale task is still present, and then emits the late stale connected=true / lastError=null task status patch.
node --import tsx .artifacts/pr-90937/real-runtime-proof.mts
  • Evidence after fix: the diagnostic-channel run exits 0 and shows the runtime keeps the stopped/restart-pending diagnostic shape after the late stale task patch:
{
  "proof": "pr-90937-real-gateway-runtime-diagnostic-channel",
  "head": "53b37e50739078238b0b582c4db5da61ddd66cb8",
  "durationMs": 5255,
  "startCount": 1,
  "capturedAbort": true,
  "afterStart": {
    "running": true,
    "connected": true,
    "lastError": null
  },
  "beforeLateStatusPatch": {
    "running": false,
    "connected": false,
    "restartPending": true,
    "reconnectAttempts": 0,
    "lastError": "channel stop timed out after 5000ms"
  },
  "afterLateStaleStatusPatch": {
    "running": false,
    "connected": false,
    "restartPending": true,
    "reconnectAttempts": 0,
    "lastError": "channel stop timed out after 5000ms"
  }
}

Runtime stderr also emitted the expected gateway timeout diagnostic:

[pr-90937-real-runtime-proof] [default] channel stop exceeded 5000ms after abort; continuing shutdown

Additional focused regression evidence on the same code path:

node scripts/run-vitest.mjs run \
  --config test/vitest/vitest.gateway.config.ts \
  src/gateway/server-channels.test.ts \
  -t "keeps recovery timeout diagnostics"

# 1 passed

Additional cloud proof on the refreshed PR line:

Real behavior proof  pass  https://github.com/openclaw/openclaw/actions/runs/27114768550
  • Observed result after fix: the visible runtime no longer returns to the reported false-positive shape (connected=true, lastError=null) after the stale aborted task emits its late status patch; it remains running=false, connected=false, restartPending=true, reconnectAttempts=0, with lastError="channel stop timed out after 5000ms".
  • What was not tested: live Telegram Bot API polling and macOS LaunchAgent supervisor behavior still require a credentialed maintainer-owned environment; the added proof uses a local diagnostic channel to exercise the same gateway manager lifecycle state without network secrets.

Risk / behavior notes

  • This does not force-start a duplicate runner while an old task is still present; that remains safer for polling transports.
  • If an old task never exits, status now retains a concrete recovery timeout diagnostic instead of silently returning to lastError=null with a stale connected marker.
  • The fix is channel-agnostic and stays in the gateway lifecycle manager rather than Telegram-specific code.

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 6, 2026
@clawsweeper

clawsweeper Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: passed. Reviewed June 8, 2026, 1:19 AM ET / 05:19 UTC.

Summary
This PR sanitizes status patches from aborted channel tasks in the gateway manager and adds regression tests for stale restart diagnostics.

PR surface: Source +56, Tests +78. Total +134 across 2 files.

Reproducibility: yes. Source inspection and the PR's before-fix regression show the sequence: non-manual stop timeout, restart request while the stale task remains, then a late connected=true / lastError=null status patch on current main.

Review metrics: 1 noteworthy metric.

  • Aborted-task lifecycle filtering: 5 gateway-owned fields stripped after abort. Late task patches can no longer overwrite running, restart, or start/stop lifecycle state after the gateway has aborted the task.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] This touches the shared gateway lifecycle path used by all channel plugins; the supplied proof exercises a real local diagnostic channel but not credentialed Telegram polling or macOS LaunchAgent recovery.

Maintainer options:

  1. Accept the diagnostic-channel proof (recommended)
    Maintainership can accept the supplied real gateway runtime proof because it exercises the manager lifecycle state directly, with the residual risk limited to live transport and LaunchAgent coverage.
  2. Run a live Telegram recovery smoke
    If maintainers want stronger production-path confidence, run a credentialed Telegram polling or macOS LaunchAgent recovery check before merging.

Next step before merge

  • [P2] No repair lane is needed because I found no actionable code defect; the remaining decision is maintainer acceptance of the availability risk or optional live transport proof.

Security
Cleared: No concrete security or supply-chain concern: the diff only changes gateway lifecycle TypeScript and colocated tests.

Review details

Best possible solution:

Land the gateway-owned lifecycle fix with the regression tests, keeping the stale-task filtering channel-agnostic and accepting or separately proving the remaining live Telegram/LaunchAgent availability risk.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection and the PR's before-fix regression show the sequence: non-manual stop timeout, restart request while the stale task remains, then a late connected=true / lastError=null status patch on current main.

Is this the best way to solve the issue?

Yes. The PR fixes the owner boundary in the gateway manager rather than adding Telegram-specific logic or health-monitor workarounds, and the added stop-hook regression protects the main compatibility concern I saw.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b8adc11977ab.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix real gateway runtime live output from the patched manager path with real timers and a locally registered diagnostic channel plugin.
  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🦐 gold shrimp: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The linked report describes a real channel workflow where Telegram stopped responding until a Gateway restart recovered it.
  • merge-risk: 🚨 availability: The patch changes shared channel start/stop lifecycle state, so a mistake could affect channel runtime recovery or availability across plugins.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Sufficient (live_output): The PR body supplies after-fix real gateway runtime live output from the patched manager path with real timers and a locally registered diagnostic channel plugin.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix real gateway runtime live output from the patched manager path with real timers and a locally registered diagnostic channel plugin.
Evidence reviewed

PR surface:

Source +56, Tests +78. Total +134 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 73 17 +56
Tests 1 78 0 +78
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 151 17 +134

What I checked:

  • Current main still has the stale overwrite path: Current main passes a channel task's setStatus directly into setRuntime, so a task whose abort signal is already set can still merge connected=true and lastError=null into gateway-owned lifecycle state. (src/gateway/server-channels.ts:562, b8adc11977ab)
  • Current main records the timeout diagnostic before the stale patch can overwrite it: The non-manual stop timeout records restartPending: true and a concrete timeout lastError, which is the diagnostic the PR is protecting from later stale task status patches. (src/gateway/server-channels.ts:788, b8adc11977ab)
  • PR filters aborted task status patches: The PR adds sanitizeAbortedTaskStatusPatch, stripping gateway-owned lifecycle fields and preventing a stale connected=true heartbeat or lastError=null patch from clearing existing lifecycle diagnostics after abort. (src/gateway/server-channels.ts:55, 53b37e507390)
  • PR applies the filter at the task status boundary: The PR routes running task setStatus calls through setRuntimeFromTaskStatus, while the later ClawSweeper repair keeps explicit stop hooks using direct setRuntime so stop hooks can still clear transport state during shutdown. (src/gateway/server-channels.ts:370, 53b37e507390)
  • Regression coverage matches the reported stale state: The new regression drives a non-manual stop timeout, requests restart while the stale task remains, emits a late connected=true / lastError=null task patch, and asserts the account stays disconnected with restart pending and the timeout diagnostic preserved. (src/gateway/server-channels.test.ts:470, 53b37e507390)
  • Health monitor caller path inspected: The channel health monitor uses stopChannel(..., { manual: false }), resets attempts, and then calls startChannel, so the fix is in the shared lifecycle path that owns the reported recovery sequence. (src/gateway/channel-health-monitor.ts:168, b8adc11977ab)

Likely related people:

  • Peter Steinberger: Blame attributes the current gateway lifecycle block to bab18d567b0cd9811012a50e032f71a9c9beb440, and earlier history shows focused channel recovery hardening in the same files. (role: recent area contributor; confidence: high; commits: bab18d567b0c, 1f850374f6e7; files: src/gateway/server-channels.ts, src/gateway/server-channels.test.ts, src/gateway/channel-health-monitor.ts)
  • David Szarzynski: The channel health monitor auto-restart surface that calls this gateway lifecycle path was introduced in 497e2d76ad45a252d7164ba08c9fdcdd8f3f6f6e. (role: introduced adjacent behavior; confidence: medium; commits: 497e2d76ad45; files: src/gateway/channel-health-monitor.ts, src/gateway/channel-health-monitor.test.ts)
  • Tak Hoffman: Recent gateway health-monitor account gating work changed server-channels.ts and its tests, making this a useful routing candidate for review of lifecycle side effects. (role: recent adjacent contributor; confidence: medium; commits: 29fec8bb9f0b; files: src/gateway/server-channels.ts, src/gateway/server-channels.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. labels Jun 6, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 6, 2026
@snowzlm

snowzlm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Updated the PR body with a Real behavior proof section and copied after-fix source-runtime output for the patched gateway lifecycle path.

/review

@clawsweeper

clawsweeper Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 6, 2026
@Takhoffman

Copy link
Copy Markdown
Contributor

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=53b37e50739078238b0b582c4db5da61ddd66cb8)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-06-08T05:29:12Z
Merge commit: a4f0e508dfcb

What merged:

  • This PR sanitizes status patches from aborted channel tasks in the gateway manager and adds regression tests for stale restart diagnostics.
  • PR surface: Source +56, Tests +78. Total +134 across 2 files.
  • Reproducibility: yes. Source inspection and the PR's before-fix regression show the sequence: non-manual sto ... while the stale task remains, then a late connected=true / lastError=null status patch on current main.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(gateway): preserve stale restart diagnostics
  • PR branch already contained follow-up commit before automerge: fix(gateway): preserve stale channel restart diagnostics

The automerge loop is complete.

Automerge progress:

  • 2026-06-08 04:01:10 UTC review passed 05974a884959 (structured ClawSweeper verdict: pass (sha=05974a88495963c9c7ab1e2e39a9ae8701d6e...)
  • 2026-06-08 03:54:37 UTC review queued 05974a884959 (queued)
  • 2026-06-08 04:01:37 UTC review queued 53b37e507390 (after repair)
  • 2026-06-08 05:19:41 UTC review passed 53b37e507390 (structured ClawSweeper verdict: pass (sha=53b37e50739078238b0b582c4db5da61ddd66...)
  • 2026-06-08 05:29:14 UTC merged 53b37e507390 (merged by ClawSweeper automerge)

@clawsweeper clawsweeper Bot added clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper:human-review Needs maintainer review before ClawSweeper can continue labels Jun 8, 2026
@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: - No automated repair lane is needed because there is no discrete patch defect; the remaining action is maintainer review and normal merge/check gating.; Cleared: The diff only changes gateway lifecycle TypeScript and a focused test; it adds no dependency, workflow, secret, package, or code-execution surface. (sha=2e267c6cbe792fdca4d94bc922e9c34a563199ee)

Why human review is needed:
This item has security-sensitive risk. ClawSweeper is pausing instead of making an autonomous change that could affect trust, credentials, permissions, or exposure.

What the maintainer can do as a next step:
If the maintainer accepts the current risk and wants ClawSweeper to continue merge gates, comment @clawsweeper approve. If the security-sensitive detail still needs changes, describe the safe path or push the fix, then comment @clawsweeper automerge. If the risk should not be automated, keep the PR paused for manual review or comment @clawsweeper stop.

I added clawsweeper:human-review and left the final call with a maintainer.

@snowzlm snowzlm force-pushed the fix/90901-telegram-restart-pending branch from 2e267c6 to 05974a8 Compare June 8, 2026 03:37
@openclaw-barnacle openclaw-barnacle Bot added triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: supplied External PR includes structured after-fix real behavior proof. and removed proof: supplied External PR includes structured after-fix real behavior proof. triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 8, 2026
@snowzlm

snowzlm commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Refreshed this PR onto latest main, replaced the PR body with the updated evidence, and verified the current check set is green. The earlier agents-core-tools camera failure now passes on the refreshed branch. @clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot added status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. clawsweeper:human-review Needs maintainer review before ClawSweeper can continue labels Jun 8, 2026
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. clawsweeper:human-review Needs maintainer review before ClawSweeper can continue and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels Jun 8, 2026
@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞✅
ClawSweeper is pausing this repair loop for human review.

Source: clawsweeper[bot]
Reason: - [P1] No repair lane is needed because I found no code defect to fix; the remaining blocker is contributor or maintainer proof/override before automerge.; Cleared: The diff only changes gateway TypeScript lifecycle logic and tests; it does not touch secrets, dependencies, workflows, package metadata, or code-execution supply-chain surfaces. (sha=53b37e50739078238b0b582c4db5da61ddd66cb8)

Why human review is needed:
This item has security-sensitive risk. ClawSweeper is pausing instead of making an autonomous change that could affect trust, credentials, permissions, or exposure.

What the maintainer can do as a next step:
If the maintainer accepts the current risk and wants ClawSweeper to continue merge gates, comment @clawsweeper approve. If the security-sensitive detail still needs changes, describe the safe path or push the fix, then comment @clawsweeper automerge. If the risk should not be automated, keep the PR paused for manual review or comment @clawsweeper stop.

I added clawsweeper:human-review and left the final call with a maintainer.

@snowzlm

snowzlm commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the latest ClawSweeper feedback by updating the PR body with a redacted real gateway runtime diagnostic-channel proof on current head 53b37e5. The run uses the actual patched createChannelManager path with real timers and a real locally registered channel plugin, reproduces the stale aborted task sequence after the 5000ms stop timeout, and shows the late connected/null-error patch no longer clears the timeout diagnostic. @clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. clawsweeper:human-review Needs maintainer review before ClawSweeper can continue labels Jun 8, 2026
@clawsweeper clawsweeper Bot merged commit a4f0e50 into openclaw:main Jun 8, 2026
209 of 219 checks passed
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request Jun 8, 2026
)

Summary:
- This PR sanitizes status patches from aborted channel tasks in the gateway manager and adds regression tests for stale restart diagnostics.
- PR surface: Source +56, Tests +78. Total +134 across 2 files.
- Reproducibility: yes. Source inspection and the PR's before-fix regression show the sequence: non-manual sto ... while the stale task remains, then a late `connected=true` / `lastError=null` status patch on current main.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(gateway): preserve stale restart diagnostics
- PR branch already contained follow-up commit before automerge: fix(gateway): preserve stale channel restart diagnostics

Validation:
- ClawSweeper review passed for head 53b37e5.
- Required merge gates passed before the squash merge.

Prepared head SHA: 53b37e5
Review: openclaw#90937 (comment)

Co-authored-by: snowzlm <snowzlm@noreply.codeberg.org>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: takhoffman
Co-authored-by: takhoffman <781889+takhoffman@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge gateway Gateway runtime merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Telegram channel stuck not-running with restartPending=true while Bot API probe succeeds; gateway restart kills stale process and recovers

2 participants