Skip to content

fix(agents): recover reasoning-only vLLM OpenAI-compat replies#31746

Closed
liuxiaopai-ai wants to merge 1 commit intoopenclaw:mainfrom
liuxiaopai-ai:codex/vllm-empty-response-31598
Closed

fix(agents): recover reasoning-only vLLM OpenAI-compat replies#31746
liuxiaopai-ai wants to merge 1 commit intoopenclaw:mainfrom
liuxiaopai-ai:codex/vllm-empty-response-31598

Conversation

@liuxiaopai-ai
Copy link

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: some non-OpenAI openai-completions providers (reported with vLLM + DeepSeek) can return assistant turns where user-visible content is empty while reasoning is present, resulting in empty OpenClaw replies.
  • Why it matters: users see blank assistant output even though inference succeeded.
  • What changed: added a guarded fallback in extractAssistantText to recover text from reasoning blocks only for non-OpenAI providers on openai-completions, and only when the turn is not a tool-call/error turn.
  • What did NOT change (scope boundary): OpenAI/OpenAI Codex provider behavior is unchanged; tool-call turns still do not surface reasoning as normal reply text.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • For non-OpenAI providers configured with api: openai-completions (for example vLLM), reasoning-only assistant responses no longer get dropped as empty text in the normal reply path.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS (local test run)
  • Runtime/container: Node 22 + pnpm
  • Model/provider: unit-test coverage for openai-completions + provider gating
  • Integration/channel (if any): embedded agent subscribe/extract path
  • Relevant config (redacted): models.providers.vllm.api = "openai-completions"

Steps

  1. Create an assistant message where user-visible text is empty and only thinking/reasoning content exists.
  2. Run extraction through extractAssistantText.
  3. Compare output for provider=vllm vs provider=openai and for tool-call turns.

Expected

  • Non-OpenAI OpenAI-compat providers recover text instead of returning empty.
  • OpenAI provider behavior remains unchanged.
  • Tool-call turns do not leak reasoning as reply text.

Actual

  • Matches expected with new tests.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • pnpm test src/agents/pi-embedded-utils.test.ts
    • pnpm test src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.subscribeembeddedpisession.test.ts
    • pnpm check
  • Edge cases checked:
    • openai provider does not use fallback
    • tool-call turn does not use fallback
    • think-tag-only content recovers after promotion to thinking blocks
  • What you did not verify:
    • live vLLM endpoint end-to-end in this branch run

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert commit fix(agents): recover reasoning-only openai-compat replies.
  • Files/config to restore: src/agents/pi-embedded-utils.ts.
  • Known bad symptoms reviewers should watch for: unexpected reasoning text shown for providers that previously returned empty replies.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: non-OpenAI OpenAI-compat providers could surface reasoning text in edge cases where a provider intentionally separates private reasoning.
    • Mitigation: fallback is strictly gated (non-openai provider, openai-completions API, no tool calls, non-error turn) and covered by regression tests.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c2b73ac42

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +249 to +253
const fallbackThinkingText =
!extracted && shouldFallbackToThinkingText(msg)
? stripThinkingTagsFromText(extractAssistantThinking(msg)).trim()
: "";
const textForUser = extracted || fallbackThinkingText;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip thinking fallback when reasoning output is enabled

The new fallback returns extractAssistantThinking(msg) as user text whenever text blocks are empty, but in the message-end flow we also independently emit reasoning from the same thinking blocks (extractAssistantThinkingformatReasoningMessage in src/agents/pi-embedded-subscribe.handlers.messages.ts). For non-OpenAI openai-completions sessions with reasoningMode: "on" and block replies enabled, a thinking-only assistant turn now produces duplicate output (once as normal answer text and once as a reasoning message), which is a regression from the previous behavior where only reasoning was emitted.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 2, 2026

Greptile Summary

Fixes empty assistant responses for non-OpenAI openai-completions providers (vLLM, DeepSeek) when inference succeeds but returns reasoning-only content with empty text blocks.

Key changes:

  • Added shouldFallbackToThinkingText() guard that only activates for non-OpenAI providers using openai-completions API
  • Modified extractAssistantText() to recover text from reasoning blocks when normal extraction returns empty
  • Excludes tool-call turns and error turns from fallback to prevent reasoning leakage
  • OpenAI and OpenAI Codex provider behavior remains unchanged

Test coverage:

  • Verified vLLM provider recovers reasoning text when content is empty
  • Confirmed OpenAI provider still returns empty (no behavioral change)
  • Ensured tool-call turns don't leak reasoning
  • Validated think-tag promotion recovery for compat providers

The implementation is well-gated with defensive checks for edge cases (case-insensitive provider comparison, non-string provider values, multiple thinking blocks). Minor caveat: no live vLLM endpoint verification, relying on unit test coverage.

Confidence Score: 4/5

  • Safe to merge with minimal risk - well-scoped bug fix with comprehensive test coverage
  • Score reflects thorough implementation with careful gating (API check, provider exclusions, stop reason checks, tool call exclusions), comprehensive test coverage of edge cases, and minimal scope limited to non-OpenAI providers. Not 5/5 due to lack of live vLLM endpoint verification, but unit tests provide strong confidence in correctness
  • No files require special attention - all changes are well-tested and focused

Last reviewed commit: 0c2b73a

@steipete
Copy link
Contributor

steipete commented Mar 2, 2026

Thanks for the PR! Multiple PRs address issue #31598. Keeping #31703 as the earliest submission. Closing to reduce noise. This is an AI-assisted triage review. If we got this wrong, feel free to reopen — happy to revisit.

@steipete steipete closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: local VLLM model returns empty response (no generated text)

2 participants