Skip to content

[Feature]: Pass think: false to Ollama for non-reasoning models #6152

@HazMatt69

Description

@HazMatt69

Problem or Use Case

Summary

When using Hermes Agent with a local Ollama backend and a thinking-capable model (e.g. qwen3:8b), Hermes never passes think: false in the chat completions request. This causes the model to run its full reasoning chain on every request, which on CPU inference can take several minutes before producing the first output token — making the agent loop effectively unusable.

Environment

  • Hermes Agent (latest)
  • Ollama 0.20.0
  • Model: qwen3:8b Q4_K_M via custom OpenAI-compatible endpoint (http://host.docker.internal:11434/v1)
  • CPU inference (no GPU)

Steps to reproduce

  1. Configure Hermes with a local Ollama endpoint
  2. Use any Qwen3 model (or other thinking-capable model)
  3. Send any message — observe multi-minute delay before first token
  4. Check Ollama logs — thinking tokens are being generated silently before any response content

Root cause

Ollama 0.6+ supports a think parameter in the /api/chat and /v1/chat/completions endpoints. When think: false is passed, the model skips the reasoning phase entirely and responds immediately. Hermes never passes this parameter, so thinking-capable models always run in thinking mode regardless of the user's reasoning_effort config.

The affected code is _build_api_kwargs() in run_agent.py around line 5394, where the chat completions payload is assembled.

Workaround

Manually patching run_agent.py to add "think": False to the api_kwargs dict fixes the issue and brings response time from several minutes down to ~1 second on the same hardware.

Proposed Solution

Add an opt-in config option (e.g. provider.think: false) or auto-detect when the endpoint is an Ollama instance and pass think: false when reasoning_effort is not explicitly enabled. At minimum, exposing this as an environment variable (HERMES_OLLAMA_THINK=false) would be a low-risk fix.

Happy to submit a PR if the maintainers can advise on the preferred approach.

Alternatives Considered

No response

Feature Type

Configuration option

Scope

None

Contribution

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions