Skip to content

fix(transport): apply default temperature and parallel_tool_calls for custom OpenAI-compat endpoints (#18470)#18492

Open
XwanwanX wants to merge 1 commit into
NousResearch:mainfrom
XwanwanX:fix/custom-openai-chat-completions-defaults
Open

fix(transport): apply default temperature and parallel_tool_calls for custom OpenAI-compat endpoints (#18470)#18492
XwanwanX wants to merge 1 commit into
NousResearch:mainfrom
XwanwanX:fix/custom-openai-chat-completions-defaults

Conversation

@XwanwanX

@XwanwanX XwanwanX commented May 1, 2026

Copy link
Copy Markdown

Summary

Hermes routed provider=custom through chat_completions without sending temperature or parallel_tool_calls. Many local OpenAI-compatible stacks (e.g. llama.cpp / vLLM) then keep server defaults (often temperature=1.0) and only batch tool rounds when parallel_tool_calls is explicit.

Fixes #18470.

Changes

  • ChatCompletionsTransport.build_kwargs
    When is_custom_provider is true:

    • If no fixed/overridden temperature applies, send temperature: 0.2 unless the user opts out (omit_temperature in compat options).
    • When tools are present, send parallel_tool_calls: true by default unless overridden.
  • custom_providers (normalized in hermes_cli/config.py, surfaced via resolve_runtime_provider):

    • Optional: temperature (number), parallel_tool_calls (bool), omit_temperature: true.
    • Passed through CLI, gateway runtime dict, and AIAgent(custom_openai_request_options=...).
  • Delegate: subagents inherit parent compat options when effective_provider == "custom".

Config example

custom_providers:
  - name: Local
    base_url: http://127.0.0.1:8080/v1
    model: your-model-alias
    temperature: 0.25
    parallel_tool_calls: true
    # omit_temperature: true   # uncomment to omit `temperature` and use server defaults

…ndpoints

- Default chat_completions sampling temperature for provider=custom stacks (Issue NousResearch#18470)

- Send parallel_tool_calls when tools are present; optional YAML overrides via custom_providers

- Thread options through CLI, gateway runtime, and delegate subagents
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery tool/delegate Subagent delegation labels May 1, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #18483 — same root cause: chat_completions transport omits temperature and parallel_tool_calls for custom providers. This PR has broader scope (config/gateway/delegate) vs #18483's narrower transport-only fix. See also #18489.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Likely duplicate of #18483

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists tool/delegate Subagent delegation type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Custom OpenAI-compatible providers: temperature and parallel_tool_calls request fields not propagated

2 participants