Summary
I am investigating Hermes Agent with a local OpenAI-compatible vLLM endpoint serving Qwen reasoning models.
When vLLM is launched with --reasoning-parser qwen3, Qwen reasoning output may be separated from normal assistant content into reasoning-specific fields such as reasoning, reasoning_content, or streaming delta.reasoning.
Could Hermes clarify whether custom OpenAI-compatible providers are expected to support this response shape, ignore it, or require visible assistant output to be present in content?
Current status
This is not currently reproduced in my stable runtime.
My current working configuration disables --reasoning-parser qwen3, so the stable runtime does not exercise the separated reasoning-output path.
This issue is based on an integration investigation and is primarily a support-boundary / documentation clarification question, not a confirmed Hermes bug report.
Environment
- Hermes Agent: v0.14.0
- Backend: local OpenAI-compatible vLLM endpoint
- Model family: Qwen reasoning model
- vLLM configuration under investigation:
--reasoning-parser qwen3
- Current stable workaround: reasoning-parser disabled
- Additional workaround: pass
chat_template_kwargs.enable_thinking=false where supported so the model does not emit thinking/reasoning content
Observed / suspected behavior
With Qwen/vLLM reasoning-parser enabled, visible assistant content may be empty while reasoning output is emitted separately.
If a custom OpenAI-compatible provider primarily reads content, the assistant response can appear empty or invalid even though the backend generated reasoning output.
Workaround
The current stable local configuration avoids this path by:
- Disabling
--reasoning-parser qwen3
- Passing
chat_template_kwargs.enable_thinking=false to vLLM / Qwen where supported
With thinking disabled, normal assistant output is stable in the current runtime.
Request
Could Hermes clarify the expected behavior for custom OpenAI-compatible providers when the backend returns reasoning-specific fields?
Specifically:
- Are
reasoning, reasoning_content, or streaming delta.reasoning expected to be supported?
- Should Hermes ignore those fields and require visible output in
content?
- If unsupported, could this be documented for Qwen/vLLM reasoning-parser users?
- If supported, should the custom provider normalize reasoning-separated output before deciding that the response is empty?
Notes
I am intentionally not filing this as a confirmed Hermes bug, because my current stable runtime disables the reasoning-parser path.
If this response shape is unsupported by design, documenting that limitation would already be helpful.
Summary
I am investigating Hermes Agent with a local OpenAI-compatible vLLM endpoint serving Qwen reasoning models.
When vLLM is launched with
--reasoning-parser qwen3, Qwen reasoning output may be separated from normal assistantcontentinto reasoning-specific fields such asreasoning,reasoning_content, or streamingdelta.reasoning.Could Hermes clarify whether custom OpenAI-compatible providers are expected to support this response shape, ignore it, or require visible assistant output to be present in
content?Current status
This is not currently reproduced in my stable runtime.
My current working configuration disables
--reasoning-parser qwen3, so the stable runtime does not exercise the separated reasoning-output path.This issue is based on an integration investigation and is primarily a support-boundary / documentation clarification question, not a confirmed Hermes bug report.
Environment
--reasoning-parser qwen3chat_template_kwargs.enable_thinking=falsewhere supported so the model does not emit thinking/reasoning contentObserved / suspected behavior
With Qwen/vLLM reasoning-parser enabled, visible assistant
contentmay be empty while reasoning output is emitted separately.If a custom OpenAI-compatible provider primarily reads
content, the assistant response can appear empty or invalid even though the backend generated reasoning output.Workaround
The current stable local configuration avoids this path by:
--reasoning-parser qwen3chat_template_kwargs.enable_thinking=falseto vLLM / Qwen where supportedWith thinking disabled, normal assistant output is stable in the current runtime.
Request
Could Hermes clarify the expected behavior for custom OpenAI-compatible providers when the backend returns reasoning-specific fields?
Specifically:
reasoning,reasoning_content, or streamingdelta.reasoningexpected to be supported?content?Notes
I am intentionally not filing this as a confirmed Hermes bug, because my current stable runtime disables the reasoning-parser path.
If this response shape is unsupported by design, documenting that limitation would already be helpful.