Skip to content

fix(agents): detect truncated API responses to prevent silent session hang#89160

Open
joelnishanth wants to merge 2 commits into
openclaw:mainfrom
joelnishanth:fix/embedded-agent-truncated-response-recovery
Open

fix(agents): detect truncated API responses to prevent silent session hang#89160
joelnishanth wants to merge 2 commits into
openclaw:mainfrom
joelnishanth:fix/embedded-agent-truncated-response-recovery

Conversation

@joelnishanth

@joelnishanth joelnishanth commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #89051 — embedded agent sessions silently hanging after a truncated API response with no error logging or recovery.

Root cause: When an API stream ends without a proper finish_reason (or with stopReason: "length"), and the last assistant message contains incomplete/unsigned thinking blocks, resolveIncompleteTurnPayloadText bails early at the payloadCount > 0 guard. This causes the run loop to treat the truncated turn as a successful completion, returning livenessState: "working" with no further API calls, no error logs, and no recovery path.

Fix: Adds a isTruncatedTerminalAssistantTurn detector that identifies truncated responses (missing/length stopReason + incomplete-thinking assessment from assessLastAssistantMessage). This detector:

  1. Bypasses the payloadCount > 0 short-circuit in resolveIncompleteTurnPayloadText — truncated responses now surface the incomplete-turn error even when partial text was streamed (parallel to existing toolUseTerminal guard from [Bug] Agent tool-chain final text segments silently dropped — invisible to user and next-turn context #76477)
  2. Extends the terminal detection gate so truncated responses don't silently pass through when other detectors miss them
  3. Logs a clear warning in the run loop when truncation is detected, giving operators visibility into what was previously a 14-minute silent window

Changes

  • src/agents/embedded-agent-runner/run/incomplete-turn.ts: New isTruncatedTerminalAssistantTurn export + wired into resolveIncompleteTurnPayloadText as a bypass guard
  • src/agents/embedded-agent-runner/run.ts: Import + log.warn when truncation is detected before the success exit path
  • src/agents/embedded-agent-runner/run.incomplete-turn.test.ts: 9 new tests (6 unit + 2 integration-level + 1 end-to-end via runEmbeddedAgent)

Test plan

  • All 242 tests pass (no regressions)
  • New tests verify: unsigned thinking + missing stopReason detected; stopReason=length detected; valid content + stop not flagged; toolUse stop not flagged; null assistant not flagged
  • Integration test: payloadCount > 0 + truncated response = error surfaced (previously returned null)
  • End-to-end test via full runEmbeddedAgent: truncated Ollama response with partial streamed text = error payload + warn log (previously silent success)
  • Type-check passes cleanly (pnpm tsgo)

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: [Bug]: Embedded agent session silently hangs after auto-compaction with no error logging or recovery #89051 — embedded agent sessions silently hang for ~14 minutes after a truncated API response (e.g. Ollama with context window exhaustion returning done_reason: "length") with no error, no log, and no recovery.
  • Real environment tested: macOS 15, Node 22, Ollama llama3.2:3b running locally on localhost:11434, OpenClaw dev build from this branch.
  • Exact steps or command run after this patch: Ran npx tsx proof-89051.ts which calls the real Ollama /api/chat endpoint with num_predict: 3 to force a truncated response (done_reason: "length"), then feeds the real API response through the patched isTruncatedTerminalAssistantTurn and resolveIncompleteTurnPayloadText functions.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): Live proof against real Ollama
  • Observed result after fix: The truncation detector correctly identifies the real Ollama truncated response (done_reason: "length", content: "The history of") and resolveIncompleteTurnPayloadText returns an error message instead of null. Before the fix, it returned null causing the session to hang silently. After the fix, it returns the error string and log.warn fires with provider/model/stopReason details.
  • What was not tested: Multi-turn recovery loop (existing compaction/retry logic handles that path). Production cloud API truncation (tested with local Ollama which produces identical stopReason: "length" behavior).
  • Proof limitations or environment constraints: Ollama local instance was used as the truncation source since cloud providers rarely truncate on-demand. The detection logic is provider-agnostic — it checks stopReason and assessLastAssistantMessage regardless of provider.

Joel Nishanth | offlyn.AI

@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: found issues before merge. Reviewed June 1, 2026, 7:49 PM ET / 23:49 UTC.

Summary
The PR adds embedded-runner truncated assistant-turn detection, warning logs, regression tests, and a committed live Ollama proof screenshot.

PR surface: Source +55, Tests +148, Config 0. Total +203 across 4 files.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Partly: the linked report and proof screenshot demonstrate an Ollama done_reason: "length" truncation, and source inspection shows the PR's new guard is not reached for current Ollama length terminals unless provider mapping changes.

Review metrics: 1 noteworthy metric.

  • Committed proof assets: 1 binary image added. OpenClaw policy routes PR proof media to comments or artifact storage, so this is a merge-relevant cleanup fact beyond ordinary file counts.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🐚 platinum hermit
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Map Ollama done_reason: "length" to stopReason: "length" and add a native stream regression that reaches the embedded runner.
  • Remove .github/proof/proof-89051-live-ollama.png from the branch and keep proof in the PR body/comment or artifact storage.
  • [P1] Get maintainer acceptance for the fail-closed abandoned-run behavior before merge.

Risk before merge

  • [P1] Merging as-is could leave the reported real Ollama length-truncation path unfixed because the provider stream still normalizes done_reason: "length" to stop before the runner detector runs.
  • [P1] The PR intentionally turns some malformed partial-payload turns from successful working exits into abandoned error payloads, so maintainers should explicitly accept that fail-closed behavior for existing long-running sessions before merge.

Maintainer options:

  1. Fix provider length propagation first (recommended)
    Map Ollama done_reason: "length" to the shared length stop reason and add a native stream regression before relying on the runner detector.
  2. Accept runner-only mitigation with owner signoff
    Maintainers could intentionally land the generic runner guard first, but they should explicitly accept that real Ollama length proof remains incomplete and that partial-payload turns now fail closed.
  3. Pause until the linked issue path is narrowed
    If the desired lifecycle behavior is still unsettled, keep this PR paused while the linked issue decides whether truncation should abandon, retry, or recover differently.

Next step before merge

  • [P1] Maintainer review is needed because the remaining blocker combines provider-boundary ownership with an intentional fail-closed compatibility change, not only a mechanical edit.

Security
Cleared: No dependency, workflow, secret-handling, or code-execution security regression was found; the binary proof image is handled as merge-readiness artifact churn rather than a concrete security issue.

Review findings

  • [P1] Preserve length stops before relying on this guard — src/agents/embedded-agent-runner/run/incomplete-turn.ts:134
  • [P2] Remove the committed proof image — .github/proof/proof-89051-live-ollama.png:1
Review details

Best possible solution:

Preserve native provider length termination at the Ollama stream boundary, remove the committed proof image, keep the runner guard as a generic safety net, and land only after maintainer acceptance of the fail-closed liveness behavior.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Partly: the linked report and proof screenshot demonstrate an Ollama done_reason: "length" truncation, and source inspection shows the PR's new guard is not reached for current Ollama length terminals unless provider mapping changes.

Is this the best way to solve the issue?

Is this the best way to solve the issue? No; the runner guard is a useful safety net, but the best fix must also preserve the provider's native length terminal signal and get maintainer acceptance for the fail-closed lifecycle change.

Full review comments:

  • [P1] Preserve length stops before relying on this guard — src/agents/embedded-agent-runner/run/incomplete-turn.ts:134
    This guard only fires when the runner sees a missing stop reason or stopReason === "length", but current Ollama stream code builds every non-tool terminal response as stopReason: "stop" and emits done reason "stop" even when upstream done_reason was "length". The live proof's real signal is done_reason: "length", so as-is a real Ollama truncation can still arrive at this new guard as "stop" and fall through the payloadCount > 0 success path. Please map native Ollama length termination to the shared length stop reason and add a native stream regression.
    Confidence: 0.9
  • [P2] Remove the committed proof image — .github/proof/proof-89051-live-ollama.png:1
    Repository policy routes PR screenshots and proof media to comments or external artifact storage, but this branch adds a binary proof image under .github/proof. Keeping proof media in the product repo creates permanent artifact churn unrelated to runtime behavior; the existing PR/comment screenshot link is the right place for it.
    Confidence: 0.97

Overall correctness: patch is incorrect
Overall confidence: 0.87

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 086274fd7e7a.

Label changes

Label changes:

  • add rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🐚 platinum hermit and patch quality is 🧂 unranked krab.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🧂 unranked krab, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets an urgent embedded-agent silent hang that can stall real long-running sessions after truncation or compaction.
  • merge-risk: 🚨 compatibility: The diff changes malformed partial assistant turns from a successful partial-output path into an abandoned error path, which can change existing session behavior on upgrade.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🐚 platinum hermit and patch quality is 🧂 unranked krab.
  • status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Override: The contributor posted a terminal screenshot from a live Ollama truncation and included proof: override; it is useful after-fix evidence, but it does not settle the provider-boundary finding because the screenshot does not show OpenClaw's Ollama stream preserving done_reason: "length".
  • proof: 📸 screenshot: Contributor real behavior proof includes screenshot evidence. The contributor posted a terminal screenshot from a live Ollama truncation and included proof: override; it is useful after-fix evidence, but it does not settle the provider-boundary finding because the screenshot does not show OpenClaw's Ollama stream preserving done_reason: "length".
Evidence reviewed

PR surface:

Source +55, Tests +148, Config 0. Total +203 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 2 56 1 +55
Tests 1 148 0 +148
Docs 0 0 0 0
Config 1 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 204 1 +203

What I checked:

  • PR diff surface: The head diff adds isTruncatedTerminalAssistantTurn, wires it into resolveIncompleteTurnPayloadText, adds a runner warning, adds regression tests, and adds .github/proof/proof-89051-live-ollama.png. (src/agents/embedded-agent-runner/run/incomplete-turn.ts:116, 20360049eed8)
  • Provider contract supports length: The shared stream contract already defines StopReason as including length, so preserving provider length termination is a supported cross-provider signal. (packages/llm-core/src/types.ts:255, 086274fd7e7a)
  • Ollama currently drops length termination: Current buildAssistantMessage ignores done_reason and maps every non-tool Ollama terminal response to stop, which prevents a real done_reason: "length" response from reaching the PR's new stopReason === "length" guard. (extensions/ollama/src/stream.ts:1063, 086274fd7e7a)
  • Ollama done event also emits stop: The current stream terminal event also emits reason stop unless there are tool calls, so the native stream path needs a regression for length termination rather than only a runner-level synthetic case. (extensions/ollama/src/stream.ts:1451, 086274fd7e7a)
  • Current runner success guard: Current main suppresses incomplete-turn errors when payloadCount !== 0 unless the terminal state is tool use; the PR's bypass only helps real Ollama once the provider leaves a missing or length stop reason visible to the runner. (src/agents/embedded-agent-runner/run/incomplete-turn.ts:271, 086274fd7e7a)
  • Proof artifact policy: Root repository policy says PR screenshots and proof assets should be attached to the PR/comment or external artifact storage and not pushed into the product repository. (AGENTS.md:172, 086274fd7e7a)

Likely related people:

  • steipete: Recent commits on both the Ollama stream and embedded-agent incomplete-turn paths include fix(ollama): suppress disabled reasoning output, broad agent runtime refactors, and cleanup around warning/retry behavior. (role: recent area contributor; confidence: high; commits: 7562afdca37a, 4252f07ff0f1, bb46b79d3c14; files: extensions/ollama/src/stream.ts, src/agents/embedded-agent-runner/run/incomplete-turn.ts, src/agents/embedded-agent-runner/run.ts)
  • vincentkoc: Recent history on extensions/ollama/src/stream.ts includes Ollama stream processing and tool-call behavior fixes, which are the provider boundary implicated by this PR. (role: recent Ollama contributor; confidence: high; commits: c7b190beec73, 21e69fdd4fa8, dfadc7b704d2; files: extensions/ollama/src/stream.ts)
  • obviyus: Recent history on the embedded-agent incomplete-turn file includes fix(agents): surface internal abort incomplete turns, which is adjacent to the PR's error surfacing path. (role: recent incomplete-turn contributor; confidence: medium; commits: 1556e3c68ca0; files: src/agents/embedded-agent-runner/run/incomplete-turn.ts, src/agents/embedded-agent-runner/run.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels Jun 1, 2026
… hang (openclaw#89051)

Co-authored-by: Cursor <cursoragent@cursor.com>
@joelnishanth joelnishanth force-pushed the fix/embedded-agent-truncated-response-recovery branch from a392986 to 5721545 Compare June 1, 2026 17:43
@joelnishanth

joelnishanth commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

Real Behavior Proof: Live Ollama Truncation Detection

Tested against a real Ollama instance (llama3.2:3b) running locally. The proof script forces a truncated response using num_predict=3 and validates the new detection + error surfacing logic.

Screenshot: Live proof execution (real Ollama API)

Live proof output

What was proven

  1. Real truncation reproduced: Ollama returns done_reason: "length" when num_predict is hit — exactly what happens when context window fills or API rate-limits truncate responses.
  2. Detection works: isTruncatedTerminalAssistantTurn correctly identifies these as truncated (Cases A & B) while not flagging valid responses (Case C — no false positives).
  3. Error surfacing works: resolveIncompleteTurnPayloadText now returns an error message instead of null when payloadCount > 0 but the response is truncated — this is the exact code path that caused the silent hang in [Bug]: Embedded agent session silently hangs after auto-compaction with no error logging or recovery #89051.
  4. No regressions: All 242 existing tests pass.

Unit Tests (242 passing)

 Test Files  2 passed (2)
      Tests  242 passed (242)
   Duration  27.30s

proof: override — This is a defensive detection/logging fix; the "before" state was silent failure (no observable behavior to screenshot). The proof demonstrates the fix activates against real API responses.

Joel Nishanth · offlyn.AI

@clawsweeper clawsweeper Bot added proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 1, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 1, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P1 High-priority user-facing bug, regression, or broken workflow. proof: 📸 screenshot Contributor real behavior proof includes screenshot evidence. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Embedded agent session silently hangs after auto-compaction with no error logging or recovery

1 participant