Skip to content

Clarify support for Qwen/vLLM reasoning-parser output in custom OpenAI-compatible providers #38360

@yokoyamann

Description

@yokoyamann

Summary

I am investigating Hermes Agent with a local OpenAI-compatible vLLM endpoint serving Qwen reasoning models.

When vLLM is launched with --reasoning-parser qwen3, Qwen reasoning output may be separated from normal assistant content into reasoning-specific fields such as reasoning, reasoning_content, or streaming delta.reasoning.

Could Hermes clarify whether custom OpenAI-compatible providers are expected to support this response shape, ignore it, or require visible assistant output to be present in content?

Current status

This is not currently reproduced in my stable runtime.

My current working configuration disables --reasoning-parser qwen3, so the stable runtime does not exercise the separated reasoning-output path.

This issue is based on an integration investigation and is primarily a support-boundary / documentation clarification question, not a confirmed Hermes bug report.

Environment

  • Hermes Agent: v0.14.0
  • Backend: local OpenAI-compatible vLLM endpoint
  • Model family: Qwen reasoning model
  • vLLM configuration under investigation: --reasoning-parser qwen3
  • Current stable workaround: reasoning-parser disabled
  • Additional workaround: pass chat_template_kwargs.enable_thinking=false where supported so the model does not emit thinking/reasoning content

Observed / suspected behavior

With Qwen/vLLM reasoning-parser enabled, visible assistant content may be empty while reasoning output is emitted separately.

If a custom OpenAI-compatible provider primarily reads content, the assistant response can appear empty or invalid even though the backend generated reasoning output.

Workaround

The current stable local configuration avoids this path by:

  1. Disabling --reasoning-parser qwen3
  2. Passing chat_template_kwargs.enable_thinking=false to vLLM / Qwen where supported

With thinking disabled, normal assistant output is stable in the current runtime.

Request

Could Hermes clarify the expected behavior for custom OpenAI-compatible providers when the backend returns reasoning-specific fields?

Specifically:

  • Are reasoning, reasoning_content, or streaming delta.reasoning expected to be supported?
  • Should Hermes ignore those fields and require visible output in content?
  • If unsupported, could this be documented for Qwen/vLLM reasoning-parser users?
  • If supported, should the custom provider normalize reasoning-separated output before deciding that the response is empty?

Notes

I am intentionally not filing this as a confirmed Hermes bug, because my current stable runtime disables the reasoning-parser path.

If this response shape is unsupported by design, documenting that limitation would already be helpful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to haveprovider/qwenQwen / Alibaba Cloud (OAuth)questionFurther information is requestedtype/docsDocumentation improvements

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions