fix(diagnostics): track model stream progress#86504
Conversation
|
Codex review: passed. Reviewed May 26, 2026, 12:27 AM ET / 04:27 UTC. Summary PR surface: Source +59, Tests +229. Total +288 across 6 files. Reproducibility: yes. from source inspection and the linked report: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run the tests because this review was read-only. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Merge once maintainers accept the chunk/heartbeat progress policy, and leave broader configurable thresholds or graceful-degradation work to the linked issue if that scope still matters. Do we have a high-confidence way to reproduce the issue? Yes, from source inspection and the linked report: current main tracks model-call start/end activity but streamed chunks do not refresh diagnostic run progress, while active-abort recovery keys on stale lastProgressAgeMs. I did not run the tests because this review was read-only. Is this the best way to solve the issue? Yes, tracking observed stream chunks is narrower than blanket-exempting local providers and preserves recovery for silent or non-streaming calls. A broader timeout/config design would be a separate product decision, not required for this focused fix. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against c14c043be7d1. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +59, Tests +229. Total +288 across 6 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg ✨ Hatched: 🥚 common Mossy Crabkin Hatch commandComment Hatchability rules:
Rarity: 🥚 common. What is this egg doing here?
|
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
I'm not sure of this solution, don't merge |
|
This has been revised to track streamed model progress instead of blanket-disabling local model-call recovery. The previous do-not-merge pause is superseded; please re-review the new head. @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce22fb0d45
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
09cf41a to
1ffee07
Compare
|
The branch has been rebased onto current @clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper automerge |
|
🦞🔧 Source: I will update this PR branch, or open a safe credited replacement, if the repair worker finds a narrow CI fix. Automerge progress:
|
1ffee07 to
44d84b4
Compare
|
@clawsweeper automerge |
|
🦞👀 Command router queued. I will update this comment with the next step. |
|
ClawSweeper 🐠 reef update Thanks for the work here. GitHub would not let ClawSweeper push to this branch with the available credentials, so the fix moved to a narrow replacement PR. nothing personal, just permission currents. Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
fish notes: model gpt-5.5, reasoning high; reviewed against fcc74d9. |
Opened on behalf of Onur Solmaz (@osolmaz). This PR is ready for maintainer review.
Refs #85826.
Summary
Diagnostic active-abort recovery could treat a healthy streamed model call as stuck because streamed chunks did not update diagnostic progress.
This change counts model stream chunks as diagnostic progress, so long-running streamed calls stay fresh while they are actively producing output.
Silent model calls, including
stream: falsecalls, are still abortable after the diagnostic stuck-session timeout.That keeps recovery available for real stalls instead of disabling it for every local or self-hosted provider.
What Changed
The fix is now based on observed stream progress, not on blanket local-provider exemption.
Local and remote streamed providers are expected to send chunks or heartbeat-style progress; if they do, diagnostics will see that work is still moving.
Silent/non-streaming calls remain subject to the configured stuck-session timeout.
run.progressevents so token streams do not spam observers.Testing
The branch includes targeted regression tests for the diagnostic behavior changed here.
This metadata update did not rerun the test suite.
node scripts/run-vitest.mjs src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.tsnode_modules/.bin/oxfmt --check src/agents/pi-embedded-runner/run/llm-idle-timeout.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts src/logging/diagnostic-run-activity.ts src/logging/diagnostic.ts src/infra/diagnostic-events.ts src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.tsnode scripts/run-oxlint.mjs src/agents/pi-embedded-runner/run/llm-idle-timeout.ts src/agents/pi-embedded-runner/run/attempt.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.ts src/logging/diagnostic-run-activity.ts src/logging/diagnostic.ts src/infra/diagnostic-events.ts src/logging/diagnostic.test.ts src/agents/pi-embedded-runner/run/llm-idle-timeout.test.ts src/agents/pi-embedded-runner/run/attempt.model-diagnostic-events.test.tsgit diff --check.agents/skills/autoreview/scripts/autoreview --mode localReal behavior proof
Behavior addressed: Healthy streamed model calls should not be killed by diagnostic active-abort recovery while chunks are still arriving. Silent/non-streaming model calls should still be recoverable after the stuck-session timeout.
Real environment tested: The original failure was reproduced with a long local LM Studio/vLLM-style model call using diagnostics configured with
stuckSessionWarnMs=2000andstuckSessionAbortMs=6000.Exact steps or command run after this patch: The targeted unit/regression commands listed in the Testing section were reported for the touched diagnostics behavior.
Evidence after fix: Regression tests assert that stream chunks refresh diagnostic activity, that silent local model calls still trigger active-abort recovery after the timeout, and that diagnostics-disabled mode does not retain stream-progress activity.
Observed result after fix: Diagnostics can still classify and recover truly stalled model calls, but streamed model calls update
lastProgressReasontomodel_call:stream_progressas chunks arrive.What was not tested: The full long-running LM Studio/vLLM live repro was not rerun after the stream-progress patch.
Risks
The main risk is provider behavior that opens a streamed request but then stays silent longer than the stuck-session timeout.
That remains abortable by design.
Providers that need longer work before first output should either stream heartbeat-style progress or finish non-streaming calls within the configured timeout.