Skip to content

fix(diagnostics): track model stream progress#86504

Closed
osolmaz wants to merge 5 commits into
openclaw:mainfrom
osolmaz:fix/local-model-diagnostic-abort
Closed

fix(diagnostics): track model stream progress#86504
osolmaz wants to merge 5 commits into
openclaw:mainfrom
osolmaz:fix/local-model-diagnostic-abort

Conversation

@osolmaz

@osolmaz osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member

Opened on behalf of Onur Solmaz (@osolmaz). This PR is ready for maintainer review.

Refs #85826.

Summary

Diagnostic active-abort recovery could treat a healthy streamed model call as stuck because streamed chunks did not update diagnostic progress.
This change counts model stream chunks as diagnostic progress, so long-running streamed calls stay fresh while they are actively producing output.
Silent model calls, including stream: false calls, are still abortable after the diagnostic stuck-session timeout.
That keeps recovery available for real stalls instead of disabling it for every local or self-hosted provider.

What Changed

The fix is now based on observed stream progress, not on blanket local-provider exemption.
Local and remote streamed providers are expected to send chunks or heartbeat-style progress; if they do, diagnostics will see that work is still moving.
Silent/non-streaming calls remain subject to the configured stuck-session timeout.

  • Emits model-call stream progress from the model diagnostic wrapper when response chunks arrive.
  • Refreshes in-memory diagnostic activity for every chunk, while throttling emitted run.progress events so token streams do not spam observers.
  • Keeps diagnostic active-abort recovery enabled for silent local and remote model calls.
  • Skips stream-progress bookkeeping when diagnostics are disabled, so disabled diagnostics do not retain stale activity state.
  • Adds regression coverage for streamed progress, silent local model-call abort recovery, and diagnostics-disabled behavior.

Testing

The branch includes targeted regression tests for the diagnostic behavior changed here.
This metadata update did not rerun the test suite.

  • node scripts/run-vitest.mjs src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.ts
  • node_modules/.bin/oxfmt --check src/agents/pi-embedded-runner/run/llm-idle-timeout.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts src/logging/diagnostic-run-activity.ts src/logging/diagnostic.ts src/infra/diagnostic-events.ts src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.ts
  • node scripts/run-oxlint.mjs src/agents/pi-embedded-runner/run/llm-idle-timeout.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts src/logging/diagnostic-run-activity.ts src/logging/diagnostic.ts src/infra/diagnostic-events.ts src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.ts
  • git diff --check
  • .agents/skills/autoreview/scripts/autoreview --mode local

Real behavior proof

Behavior addressed: Healthy streamed model calls should not be killed by diagnostic active-abort recovery while chunks are still arriving. Silent/non-streaming model calls should still be recoverable after the stuck-session timeout.

Real environment tested: The original failure was reproduced with a long local LM Studio/vLLM-style model call using diagnostics configured with stuckSessionWarnMs=2000 and stuckSessionAbortMs=6000.

Exact steps or command run after this patch: The targeted unit/regression commands listed in the Testing section were reported for the touched diagnostics behavior.

Evidence after fix: Regression tests assert that stream chunks refresh diagnostic activity, that silent local model calls still trigger active-abort recovery after the timeout, and that diagnostics-disabled mode does not retain stream-progress activity.

Observed result after fix: Diagnostics can still classify and recover truly stalled model calls, but streamed model calls update lastProgressReason to model_call:stream_progress as chunks arrive.

What was not tested: The full long-running LM Studio/vLLM live repro was not rerun after the stream-progress patch.

Risks

The main risk is provider behavior that opens a streamed request but then stays silent longer than the stuck-session timeout.
That remains abortable by design.
Providers that need longer work before first output should either stream heartbeat-style progress or finish non-streaming calls within the configured timeout.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Codex review: passed. Reviewed May 26, 2026, 12:27 AM ET / 04:27 UTC.

Summary
The PR records streamed model chunks as diagnostic run progress, throttles emitted progress events, skips disabled-diagnostics bookkeeping, and adds regression tests for streamed versus silent model-call recovery.

PR surface: Source +59, Tests +229. Total +288 across 6 files.

Reproducibility: yes. from source inspection and the linked report: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run the tests because this review was read-only.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • Provider streams that remain silent longer than stuckSessionAbortMs are still abortable by design; maintainers should be comfortable with chunk or heartbeat output as the progress boundary.
  • The PR body says the full long-running LM Studio/vLLM live repro was not rerun after the stream-progress patch, so live provider behavior is covered by source and regression tests rather than post-patch real-provider proof.

Maintainer options:

  1. Proceed with the chunk-progress policy (recommended)
    Accept that streamed providers must emit chunks or heartbeat-style progress before the stuck-session timeout and let exact-head checks/automerge gates decide landing.
  2. Require a live local-provider proof
    Ask for a post-patch LM Studio or vLLM run showing streamed chunks refresh diagnostics before merging if maintainers want runtime proof beyond targeted regression tests.
  3. Pause for silent-stream semantics
    If silent streamed providers must be supported for longer than the stuck-session timeout, pause this PR and design a config or provider contract instead.

Next step before merge
No repair lane is needed: this automerge-opted PR has no actionable review findings, and remaining handling is maintainer/check gating around the availability policy.

Security
Cleared: The diff only changes diagnostics runtime bookkeeping and tests; it does not touch dependencies, CI, secrets, auth, or package execution surfaces, and the new progress event carries only run/session identifiers and a fixed reason.

Review details

Best possible solution:

Merge once maintainers accept the chunk/heartbeat progress policy, and leave broader configurable thresholds or graceful-degradation work to the linked issue if that scope still matters.

Do we have a high-confidence way to reproduce the issue?

Yes, from source inspection and the linked report: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run the tests because this review was read-only.

Is this the best way to solve the issue?

Yes, tracking observed stream chunks is narrower than blanket-exempting local providers and preserves recovery for silent or non-streaming calls. A broader timeout/config design would be a separate product decision, not required for this focused fix.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c14c043be7d1.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The bug affects active model-call sessions and can lead to aborted agent turns or missing user-visible responses.
  • merge-risk: 🚨 availability: The PR changes diagnostic freshness and active-abort recovery behavior for long model streams, where mistakes can either kill healthy calls or suppress recovery of real stalls.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: This is a MEMBER-authored maintainer PR, so the external-contributor real-behavior proof gate does not apply; the PR body provides targeted regression proof but notes the full live provider repro was not rerun after the patch.
Evidence reviewed

PR surface:

Source +59, Tests +229. Total +288 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 4 98 39 +59
Tests 2 229 0 +229
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 327 39 +288

What I checked:

Likely related people:

  • vincentkoc: Current-main blame for the diagnostic model-call recovery and stream-wrapper region points to 9e93431, which carried the relevant diagnostic runtime files. (role: recent area contributor; confidence: medium; commits: 9e93431ae95a; files: src/logging/diagnostic.ts, src/logging/diagnostic-run-activity.ts, src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts)
  • amknight: Recent merged work in f824e15 changed model diagnostic events and OpenTelemetry LLM content spans on the same wrapper surface. (role: adjacent diagnostic event contributor; confidence: medium; commits: f824e1596ad5; files: src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts, src/infra/diagnostic-events.ts)
  • steipete: The latest-release commit a374c3a introduced the diagnostic event and diagnostic logging files in the available history. (role: initial diagnostic surface importer; confidence: low; commits: a374c3a5bfd5; files: src/infra/diagnostic-events.ts, src/logging/diagnostic.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

✨ Hatched: 🥚 common Mossy Crabkin

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: sparkles near resolved comments.
Image traits: location diff observatory; accessory release bell; palette charcoal, cyan, and signal green; mood curious; pose guarding a tiny green check; shell frosted glass shell; lighting subtle sparkle highlights; background miniature CI buoys.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Mossy Crabkin in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@osolmaz osolmaz self-assigned this May 25, 2026
@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels May 25, 2026
@osolmaz osolmaz marked this pull request as draft May 25, 2026 14:37
@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member Author

I'm not sure of this solution, don't merge

@osolmaz

osolmaz commented May 25, 2026

Copy link
Copy Markdown
Member Author

This has been revised to track streamed model progress instead of blanket-disabling local model-call recovery. The previous do-not-merge pause is superseded; please re-review the new head.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@osolmaz osolmaz marked this pull request as ready for review May 25, 2026 15:24

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce22fb0d45

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@clawsweeper clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 25, 2026
@osolmaz osolmaz changed the title fix(diagnostics): keep local model calls running fix(diagnostics): track model stream progress May 25, 2026
@osolmaz osolmaz force-pushed the fix/local-model-diagnostic-abort branch from 09cf41a to 1ffee07 Compare May 26, 2026 02:26
@osolmaz

osolmaz commented May 26, 2026

Copy link
Copy Markdown
Member Author

The branch has been rebased onto current main to resolve conflicts. Current head: 1ffee07d8a9a41be059f32fa6b1c04509dff3d9e.

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 26, 2026
@osolmaz

osolmaz commented May 26, 2026

Copy link
Copy Markdown
Member Author

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞🔧
ClawSweeper saw the passing review, but the PR needs another repair pass before merge.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=44d84b498f545cdb427e4c0b1f28003f1cc543fe); failed required checks before automerge: Real behavior proof:CANCELLED
Action: repair worker queued. Run: https://github.com/openclaw/clawsweeper/actions/runs/26432275224
Model: gpt-5.5

I will update this PR branch, or open a safe credited replacement, if the repair worker finds a narrow CI fix.

Automerge progress:

  • 2026-05-26 03:03:06 UTC review queued 44d84b498f54 (after repair)
  • 2026-05-26 04:28:02 UTC review passed 44d84b498f54 (structured ClawSweeper verdict: pass (sha=44d84b498f545cdb427e4c0b1f28003f1cc54...)
  • 2026-05-26 04:21:35 UTC review queued 44d84b498f54 (queued)

@clawsweeper clawsweeper Bot added clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 26, 2026
@clawsweeper clawsweeper Bot force-pushed the fix/local-model-diagnostic-abort branch from 1ffee07 to 44d84b4 Compare May 26, 2026 03:03
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels May 26, 2026
@osolmaz

osolmaz commented May 26, 2026

Copy link
Copy Markdown
Member Author

@clawsweeper automerge

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper 🐠 reef update

Thanks for the work here. GitHub would not let ClawSweeper push to this branch with the available credentials, so the fix moved to a narrow replacement PR. nothing personal, just permission currents.

Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
Replacement PR: #86757
Why close: this run explicitly closes the superseded source PR after the credited replacement PR is open, so review continues in one place.
This source PR is being closed only under the explicit source-close setting for this ClawSweeper run.
Contributor credit is carried into the replacement PR body and release-note context.
Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against fcc74d9.

@clawsweeper clawsweeper Bot closed this May 26, 2026
@osolmaz osolmaz deleted the fix/local-model-diagnostic-abort branch May 26, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants