Skip to content

fix(agents): preserve stop-finished OpenAI tool calls#88804

Merged
steipete merged 2 commits into
openclaw:mainfrom
MonkeyLeeT:codex/fix-openai-stop-tool-calls-88791
May 31, 2026
Merged

fix(agents): preserve stop-finished OpenAI tool calls#88804
steipete merged 2 commits into
openclaw:mainfrom
MonkeyLeeT:codex/fix-openai-stop-tool-calls-88791

Conversation

@MonkeyLeeT

@MonkeyLeeT MonkeyLeeT commented May 31, 2026

Copy link
Copy Markdown
Contributor

Fixes #88791

Summary

  • Preserve silent structured OpenAI-compatible tool calls when providers stream tool_calls but finish with finish_reason: stop.
  • Keep stripping tool calls for visible-text stop responses, length/error completions, and transport streams that never emitted a final finish reason.
  • Cover both the transport stream assembler and the legacy openai-completions provider path.

Verification

  • pnpm format -- src/agents/openai-transport-stream.ts src/agents/openai-transport-stream.test.ts src/llm/providers/openai-completions.ts src/llm/providers/openai-completions.test.ts
  • node scripts/run-vitest.mjs src/llm/providers/openai-completions.test.ts src/agents/openai-transport-stream.test.ts
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main

Real behavior proof

Behavior addressed: structured tool_calls paired with finish_reason: stop no longer get stripped into a silent terminal assistant turn when there is no visible assistant text.

Real environment tested: local OpenClaw worktree running the actual OpenAI-compatible transport stream function against a loopback OpenAI-compatible SSE endpoint, with no provider credentials. Proof run timestamp: 2026-05-31 15:27 PDT.

Exact steps or command run after this patch: node --import tsx /private/tmp/openclaw-real-proof-88791.mjs

Evidence after fix: terminal output copied from the after-fix runtime proof:

local OpenAI-compatible SSE endpoint: http://127.0.0.1:57954/v1
silent-stop-tool-call -> stopReason=toolUse; content=[toolCall:bash:{"cmd":"echo after-fix"}]
visible-stop-tool-call -> stopReason=stop; content=[text:"Visible answer."]
missing-finish-tool-call -> stopReason=stop; content=[]

Observed result after fix: the OpenClaw runtime preserved the silent structured tool call as toolUse, kept the visible-text stop response deliverable as text, and dropped the unfinished tool call when the stream ended without a final finish reason.

What was not tested: external vLLM/Qwen provider behavior and broad CI.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 31, 2026
@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 6:47 PM ET / 22:47 UTC.

Summary
The branch changes OpenAI-compatible stream finalizers and tests to promote silent finish_reason: stop tool-call responses to toolUse while preserving visible-text, length, and missing-finish behavior.

PR surface: Source +16, Tests +143. Total +159 across 4 files.

Reproducibility: yes. source-level reproduction is high-confidence: on current main a stream with tool-call blocks plus finish_reason: stop keeps stopReason: stop and then strips the tool calls before terminal classification. I did not run tests because this review was constrained to read-only commands.

Review metrics: none identified.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] The proof does not exercise the reporter's external vLLM/Qwen deployment or broad CI; the maintainer decision is whether the loopback wire-shape proof plus focused tests are enough for this provider-runtime bugfix.

Maintainer options:

  1. Decide the mitigation before merge
    Land the focused finalizer fix after maintainer review and CI if the loopback proof is accepted, or request one reporter-provider smoke if maintainers want exact vLLM/Qwen confirmation before merge.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • No automated repair is needed; the branch has no blocking findings and the remaining action is normal maintainer review, CI, and proof acceptance.

Security
Cleared: The diff changes TypeScript stream finalization and tests only; it does not touch dependencies, workflows, secrets, install scripts, or other supply-chain surfaces.

Review details

Best possible solution:

Land the focused finalizer fix after maintainer review and CI if the loopback proof is accepted, or request one reporter-provider smoke if maintainers want exact vLLM/Qwen confirmation before merge.

Do we have a high-confidence way to reproduce the issue?

Yes, source-level reproduction is high-confidence: on current main a stream with tool-call blocks plus finish_reason: stop keeps stopReason: stop and then strips the tool calls before terminal classification. I did not run tests because this review was constrained to read-only commands.

Is this the best way to solve the issue?

Yes, this is the right layer: finalization already owns the symmetric toolUse/tool-call consistency guard, and the branch updates both affected OpenAI-compatible paths while preserving visible-text, length, and missing-finish behavior.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against bb673f47b207.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P2: This is a normal-priority provider/runtime bugfix for intermittent silent agent turns with a limited OpenAI-compatible streaming surface.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body supplies after-fix terminal output from a local OpenClaw runtime path against a loopback OpenAI-compatible SSE endpoint, which directly exercises the changed stream assembler behavior.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body supplies after-fix terminal output from a local OpenClaw runtime path against a loopback OpenAI-compatible SSE endpoint, which directly exercises the changed stream assembler behavior.
Evidence reviewed

PR surface:

Source +16, Tests +143. Total +159 across 4 files.

View PR surface stats
Area Files Added Removed Net
Source 2 16 0 +16
Tests 2 145 2 +143
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 4 161 2 +159

What I checked:

  • Repository policy applied: Root AGENTS.md and the scoped agents guide were read; their review guidance applies because the PR touches agents/provider runtime stream assembly. (AGENTS.md:1, bb673f47b207)
  • Current transport finalizer still drops the reported shape: Current main maps finish_reason: stop to stop, then filters accumulated tool-call blocks unless the final stop reason is toolUse, which matches the linked bug's silent terminal-turn path. (src/agents/openai-transport-stream.ts:2693, bb673f47b207)
  • Current legacy provider finalizer has the same gap: The legacy openai-completions provider records a finish reason, but current main still removes tool calls whenever the final stop reason is not toolUse. (src/llm/providers/openai-completions.ts:314, bb673f47b207)
  • PR transport fix targets the finalization boundary: The branch records whether a real stop finish reason was seen, then promotes silent accumulated tool calls to toolUse only when there is no visible assistant text. (src/agents/openai-transport-stream.ts:2817, 9dcc70332ede)
  • PR legacy provider fix mirrors the invariant: The branch applies the same silent-tool-call promotion in streamOpenAICompletions, after finish and error checks and before the existing tool-call filter. (src/llm/providers/openai-completions.ts:439, 9dcc70332ede)
  • Focused regression tests cover both paths: The branch adds regression coverage for silent stop-finished tool calls, visible-text stop responses, and missing-finish transport behavior in the transport assembler and legacy provider tests. (src/agents/openai-transport-stream.test.ts:6987, 9dcc70332ede)

Likely related people:

  • shakkernerd: Available blame/history for the stream finalizers and downstream terminal classifier points to commit 23bf48e by Shakker, whose commit metadata maps to this GitHub login. (role: current implementation history owner; confidence: medium; commits: 23bf48e69eb7; files: src/agents/openai-transport-stream.ts, src/llm/providers/openai-completions.ts, src/agents/embedded-agent-runner/run/attempt-trajectory-status.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. labels May 31, 2026
@MonkeyLeeT MonkeyLeeT changed the title fix(openai): keep stop-finished tool calls fix(agents): preserve stop-finished OpenAI tool calls May 31, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 31, 2026
@MonkeyLeeT

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 31, 2026
@MonkeyLeeT MonkeyLeeT marked this pull request as ready for review May 31, 2026 22:39
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 31, 2026
@steipete steipete self-assigned this May 31, 2026

@steipete steipete left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blocking issue from CI/typecheck. The implementation shape looks right, but this branch cannot merge until the new visible-text check narrows the block shape far enough for tsgo.

Finding:

  • src/agents/openai-transport-stream.ts:2812 uses block.text.trim() after only checking block.type === "text". In this exported transport surface, output.content is wide enough that block.text is still unknown, and CI fails in check-prod-types, check-test-types, plugin SDK boundary prep, and dependent lint/boundary lanes with TS18046: 'block.text' is of type 'unknown'. Please narrow with a string check or a small local predicate before calling trim(), for example by requiring typeof block.text === "string" in the hasVisibleText predicate.

No behavior concern with the stop/tool-call promotion itself after reading the finalizer and adjacent tests; this is a compile blocker only.

@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026

@steipete steipete left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after the type-narrow fix on src/agents/openai-transport-stream.ts.

Verification I ran on the fixed tree:

  • pnpm tsgo:prod
  • node scripts/run-vitest.mjs src/llm/providers/openai-completions.test.ts src/agents/openai-transport-stream.test.ts
  • loopback OpenAI-compatible SSE proof against the real createOpenAICompletionsTransportStreamFn: silent finish_reason: stop tool call returned toolUse; visible text plus spurious tool call stayed stop; missing finish reason dropped the unfinished tool call
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main returned clean with no accepted/actionable findings

The remaining red CI jobs are outside this PR's changed files and match current-main/stale-base fallout, not this OpenAI-compatible stream change.

@steipete steipete merged commit 6316648 into openclaw:main May 31, 2026
159 of 162 checks passed
@MonkeyLeeT MonkeyLeeT deleted the codex/fix-openai-stop-tool-calls-88791 branch June 1, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: structured tool_calls with finish_reason stop are dropped as non_deliverable_terminal_turn

2 participants