Skip to content

fix(diagnostics): track model stream progress#86757

Merged
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-86504
May 26, 2026
Merged

fix(diagnostics): track model stream progress#86757
clawsweeper[bot] merged 6 commits into
mainfrom
clawsweeper/automerge-openclaw-openclaw-86504

Conversation

@clawsweeper

@clawsweeper clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Makes #86504 merge-ready for the ClawSweeper automerge loop.
The edit pass should inspect the live PR diff, review comments, and failing checks; rebase if needed; keep the contributor branch credited; and stop only when validation is green or an external blocker is proven.

ClawSweeper 🐠 replacement reef notes:

  • Repair fallback: GitHub rejected the repair branch push because it updates workflow files and the ClawSweeper app token does not have workflows permission

Co-author credit kept:

fish notes: model gpt-5.5, reasoning high; reviewed against fcc74d9.

@clawsweeper clawsweeper Bot added agents Agent runtime and tooling maintainer Maintainer-authored PR size: M clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane. clawsweeper Tracked by ClawSweeper automation labels May 26, 2026
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

Codex review: passed. Reviewed May 26, 2026, 12:46 AM ET / 04:46 UTC.

Summary
The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.

PR surface: Source +54, Tests +229. Total +283 across 6 files.

Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • Provider streams that remain silent longer than diagnostics.stuckSessionAbortMs are still abortable by design, so maintainers should accept chunk or heartbeat output as the progress boundary before merge.
  • The full long-running LM Studio/vLLM live repro was not rerun after the stream-progress patch; source inspection and regression tests cover the path, but live provider behavior is not shown in this replacement PR.
  • Exact-head CI still had in-progress checks at inspection time, so merging should wait for the normal required-check gate.

Maintainer options:

  1. Accept Chunk Progress As Freshness (recommended)
    Proceed with the current design where streamed providers must emit chunks or heartbeat-style progress before the stuck-session timeout while silent calls remain recoverable.
  2. Require Live Local-Provider Proof
    Ask for a post-patch LM Studio or vLLM run showing streamed chunks refresh diagnostics before merging if maintainers want runtime proof beyond targeted regression tests.
  3. Pause For Silent Stream Semantics
    If silent streamed providers must be supported longer than the stuck-session timeout, pause this PR and design a config or provider contract instead.

Next step before merge
No repair lane is needed because the automerge-opted PR has no actionable review findings; exact-head checks and maintainer risk acceptance should gate merge.

Security
Cleared: The diff only changes diagnostics runtime bookkeeping and tests; it does not touch dependencies, CI, secrets, auth, package execution, or external code-loading surfaces.

Review details

Best possible solution:

Merge once exact-head checks pass and maintainers accept the chunk/heartbeat progress policy, leaving broader silent-stream or configurable recovery semantics to the linked issue if still needed.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Is this the best way to solve the issue?

Yes: tracking observed stream chunks is narrower than blanket-exempting local providers and preserves recovery for silent or non-streaming calls. Broader timeout or silent-stream semantics would be separate product work.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 711e963723b4.

Label changes

Label justifications:

  • P1: The bug affects active model-call sessions and can lead to aborted agent turns or missing user-visible responses.
  • merge-risk: 🚨 availability: The PR changes diagnostic freshness and active-abort recovery behavior for long model streams, where mistakes can either kill healthy calls or suppress recovery of real stalls.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
  • status: 🚀 automerge armed: This PR is in ClawSweeper's automerge lane. Not applicable: This is a ClawSweeper bot replacement for a maintainer-requested automerge PR, so the external-contributor proof gate does not apply; the source PR records the original live failure context and targeted regression proof but not a full post-patch live rerun.
Evidence reviewed

PR surface:

Source +54, Tests +229. Total +283 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 4 100 46 +54
Tests 2 229 0 +229
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 329 46 +283

What I checked:

Likely related people:

  • steipete: Current-main blame for the diagnostic activity, stream wrapper, and stalled model-call recovery logic points to commit 321f06a authored by Peter Steinberger, and prior issue review context maps this area to steipete. (role: recent area contributor; confidence: high; commits: 321f06ad0ef5; files: src/logging/diagnostic.ts, src/logging/diagnostic-run-activity.ts, src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts)
  • amknight: Recent merged history for the same model diagnostic wrapper includes f824e15 with co-author metadata for amknight. (role: adjacent diagnostic event contributor; confidence: medium; commits: f824e1596ad5; files: src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts, src/infra/diagnostic-events.ts)
  • vincentkoc: The adjacent OpenTelemetry/model diagnostic event commit f824e15 includes co-author metadata for Vincent Koc on the same diagnostic wrapper surface. (role: adjacent diagnostic event contributor; confidence: medium; commits: f824e1596ad5; files: src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts, src/infra/diagnostic-events.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

ClawSweeper PR egg

✨ Hatched: 🥚 common Moonlit Lint Imp

Hatch command

Comment @clawsweeper hatch when this PR is hatchable.

Hatchability rules:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

Rarity: 🥚 common.
Trait: guards the happy path.
Image traits: location workflow harbor; accessory release bell; palette charcoal, cyan, and signal green; mood watchful; pose nestled inside a glowing shell; shell matte ceramic shell; lighting golden review-room light; background quiet workflow signs.
Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Moonlit Lint Imp in ClawSweeper.

What is this egg doing here?
  • Eggs appear after the PR passes real-behavior proof. It is here for vibes, not verdicts: it does not change labels, ratings, merge decisions, or automation.
  • The shell reacts to review momentum: open follow-up work warms it up, re-review makes it wobble, and a clean final review lets it hatch.
  • Hatchability usually comes from sufficient real-behavior proof, no blocking P0/P1/P2 findings, no security attention needed, and clean correctness. A merged PR is already final, so merge makes the egg hatchable independently.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@clawsweeper clawsweeper Bot merged commit 23e9bc8 into main May 26, 2026
157 of 169 checks passed
@clawsweeper clawsweeper Bot deleted the clawsweeper/automerge-openclaw-openclaw-86504 branch May 26, 2026 04:47
@clawsweeper

clawsweeper Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

🦞✅
ClawSweeper merged this PR after the passing review.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=fcc74d986934dfad148ff8b7e6a29a4c14161943)
Merge status: merged by ClawSweeper automerge
Merged at: 2026-05-26T04:47:12Z
Merge commit: 23e9bc8c0b61

What merged:

  • The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.
  • PR surface: Source +54, Tests +229. Total +283 across 6 files.
  • Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks ... covery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Automerge notes:

  • PR branch already contained follow-up commit before automerge: fix(diagnostics): track model stream progress
  • PR branch already contained follow-up commit before automerge: test(diagnostics): cover silent local model aborts
  • PR branch already contained follow-up commit before automerge: fix(diagnostics): skip stream progress when disabled

The automerge loop is complete.

Automerge progress:

  • 2026-05-26 04:46:58 UTC review passed fcc74d986934 (structured ClawSweeper verdict: pass (sha=fcc74d986934dfad148ff8b7e6a29a4c14161...)
  • 2026-05-26 04:47:15 UTC merged fcc74d986934 (merged by ClawSweeper automerge)

github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 26, 2026
Summary:
- The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.
- PR surface: Source +54, Tests +229. Total +283 across 6 files.
- Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks ... covery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(diagnostics): track model stream progress
- PR branch already contained follow-up commit before automerge: test(diagnostics): cover silent local model aborts
- PR branch already contained follow-up commit before automerge: fix(diagnostics): skip stream progress when disabled

Validation:
- ClawSweeper review passed for head fcc74d9.
- Required merge gates passed before the squash merge.

Prepared head SHA: fcc74d9
Review: openclaw#86757 (comment)

Co-authored-by: Onur Solmaz <2453968+osolmaz@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: osolmaz
Co-authored-by: osolmaz <2453968+osolmaz@users.noreply.github.com>
jameslcowan pushed a commit to jameslcowan/openclaw that referenced this pull request Jun 2, 2026
Summary:
- The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.
- PR surface: Source +54, Tests +229. Total +283 across 6 files.
- Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks ... covery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(diagnostics): track model stream progress
- PR branch already contained follow-up commit before automerge: test(diagnostics): cover silent local model aborts
- PR branch already contained follow-up commit before automerge: fix(diagnostics): skip stream progress when disabled

Validation:
- ClawSweeper review passed for head fcc74d9.
- Required merge gates passed before the squash merge.

Prepared head SHA: fcc74d9
Review: openclaw#86757 (comment)

Co-authored-by: Onur Solmaz <2453968+osolmaz@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: osolmaz
Co-authored-by: osolmaz <2453968+osolmaz@users.noreply.github.com>
SYU8384 pushed a commit to SYU8384/openclaw that referenced this pull request Jun 3, 2026
Summary:
- The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.
- PR surface: Source +54, Tests +229. Total +283 across 6 files.
- Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks ... covery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(diagnostics): track model stream progress
- PR branch already contained follow-up commit before automerge: test(diagnostics): cover silent local model aborts
- PR branch already contained follow-up commit before automerge: fix(diagnostics): skip stream progress when disabled

Validation:
- ClawSweeper review passed for head fcc74d9.
- Required merge gates passed before the squash merge.

Prepared head SHA: fcc74d9
Review: openclaw#86757 (comment)

Co-authored-by: Onur Solmaz <2453968+osolmaz@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: osolmaz
Co-authored-by: osolmaz <2453968+osolmaz@users.noreply.github.com>
sablehead pushed a commit to sablehead/openclaw that referenced this pull request Jun 10, 2026
Summary:
- The PR updates diagnostics to mark streamed model chunks as run progress, keeps silent model calls abortable after the stuck-session timeout, and adds regression coverage for stream progress and recovery behavior.
- PR surface: Source +54, Tests +229. Total +283 across 6 files.
- Reproducibility: yes. at source level: current main tracks model-call start/end activity but streamed chunks ... covery keys on stale lastProgressAgeMs. I did not run a live local-provider repro in this read-only review.

Automerge notes:
- PR branch already contained follow-up commit before automerge: fix(diagnostics): track model stream progress
- PR branch already contained follow-up commit before automerge: test(diagnostics): cover silent local model aborts
- PR branch already contained follow-up commit before automerge: fix(diagnostics): skip stream progress when disabled

Validation:
- ClawSweeper review passed for head fcc74d9.
- Required merge gates passed before the squash merge.

Prepared head SHA: fcc74d9
Review: openclaw#86757 (comment)

Co-authored-by: Onur Solmaz <2453968+osolmaz@users.noreply.github.com>
Co-authored-by: clawsweeper <274271284+clawsweeper[bot]@users.noreply.github.com>
Co-authored-by: clawsweeper[bot] <274271284+clawsweeper[bot]@users.noreply.github.com>
Approved-by: osolmaz
Co-authored-by: osolmaz <2453968+osolmaz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper Tracked by ClawSweeper automation maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. P1 High-priority user-facing bug, regression, or broken workflow. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: M status: 🚀 automerge armed This PR is in ClawSweeper's automerge lane.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant