[Bug]: Custom OpenAI-compatible providers: temperature and parallel_tool_calls request fields not propagated

### Bug Description

When Hermes Agent talks to a custom OpenAI-compatible inference server (e.g. local llama.cpp / llama-server, vLLM, etc.) configured via `custom_providers`, two important request fields silently drop on the floor in the chat_completions transport:

1. `temperature` — never set, so the backend server's default takes over. With llama.cpp this means temperature=1.0, which produces factual drift on grounded tasks. Example from a real run: the model invented "3.3B active params" (correct: 3B) and described a 20GB GGUF as "sub-4GB" (conflating active-param footprint with file size).
2. `parallel_tool_calls` — never set, so even when the underlying model and chat template support parallel tool calls, every multi-tool query serializes into N sequential assistant turns. A 4-file read fires as 4 separate turns instead of 1, ~4× the latency.

Both fields ARE handled correctly for cloud providers — the Codex/Responses transport hardcodes `parallel_tool_calls: True`, and specific model families (GPT-5, Codex, etc.) get `fixed_temperature` via `_fixed_temperature_for_model()`. The gap is specifically in the chat_completions transport's handling of the custom-provider path.

### Steps to Reproduce

1. Start a local OpenAI-compatible server. Example with llama-server:
       llama-server --model qwen3.6-35b-a3b.gguf --jinja --host 127.0.0.1 --port 8080
   (default temperature is 1.0; verify via `curl http://127.0.0.1:8080/props`)

2. Configure Hermes Agent with a custom provider in ~/.hermes/config.yaml:
       custom_providers:
         - name: Local
           base_url: http://localhost:8080/v1
           model: <your-model-alias>

3. Switch to that provider and run a multi-step prompt that should trigger parallel tool use, e.g.:
   "Read these 4 files (a.py, b.py, c.py, d.py), then summarize each one."

4. Observe two issues:
   - Output contains factual drift on grounded claims (caused by temp=1.0)
   - The four read_file calls fire one at a time (preparing → read → preparing → read → ...) rather than as one batched assistant message with 4 tool_calls


### Expected Behavior

- A sensible temperature default (e.g. 0.2-0.3) is set for agent workloads on custom providers, OR `temperature` is exposed as a per-provider config field so users can set it without restarting the inference server.
- `parallel_tool_calls: true` is sent by default on outbound /v1/chat/completions requests when tools are present, matching the Codex transport (transports/codex.py:98) and the OpenAI API spec.

### Actual Behavior

- No `temperature` field is sent on outbound requests. llama.cpp falls back to its default of 1.0 (verified via /props: default_generation_settings.temperature = 1.0). Model produces factually drifted output.
- No `parallel_tool_calls` field is sent. Even when the chat template advertises support (chat_template_caps.supports_parallel_tool_calls: true per /props), the model serializes tool calls into separate turns. A direct probe against the same llama-server shows that adding parallel_tool_calls:true to the request body produces 3 tool_calls in one assistant turn; without it, only 1. Behavior is fully reproducible.


### Affected Component

Agent Core (conversation loop, context compression, memory), Configuration (config.yaml, .env, hermes setup)

### Messaging Platform (if gateway-related)

N/A (CLI only)

### Debug Report

```shell
Report       https://paste.rs/ttMcl
agent.log    https://paste.rs/B1DTq
gateway.log  https://paste.rs/GLzg9
```

### Operating System

Ubuntu 24.04.4 LTS

### Python Version

Python 3.11.15

### Hermes Version

0.12.0 (2026.4.30)

### Additional Logs / Traceback (optional)

```shell

```

### Root Cause Analysis (optional)

**Issue 1 — temperature:**

`agent/transports/chat_completions.py:245-251` only adds `temperature` to the request when `fixed_temperature` is provided:

    # Temperature
    fixed_temp = params.get("fixed_temperature")
    omit_temp = params.get("omit_temperature", False)
    if omit_temp:
        api_kwargs.pop("temperature", None)
    elif fixed_temp is not None:
        api_kwargs["temperature"] = fixed_temp

`fixed_temperature` is populated by `_fixed_temperature_for_model()`, which only returns a value for specific cloud model families (GPT-5, Codex, etc.). Custom OpenAI-compatible providers never hit that branch, so `temperature` is never added to api_kwargs and the server's default is used.

**Issue 2 — parallel_tool_calls:**

A repo-wide grep for `parallel_tool_calls` returns only two hits:
- `agent/transports/codex.py:98` — hardcoded `parallel_tool_calls: True` for the Codex/Responses transport
- `agent/codex_responses_adapter.py:677,708-709` — passthrough handling for the Codex Responses adapter

The chat_completions transport (the one custom providers use) never sets the field. With OpenAI's spec defaulting to `true` but llama.cpp (and many other backends) requiring it explicitly, the omission produces serial-only tool calling on local stacks.


### Proposed Fix (optional)

Two minimal changes in agent/transports/chat_completions.py:

1. Default temperature for custom providers. Around line 251, when `fixed_temperature` is None AND `is_custom_provider` is True, set a sane default:
       elif params.get("is_custom_provider"):
           api_kwargs["temperature"] = params.get("temperature", 0.2)

2. Default parallel_tool_calls when tools are present. Around line 265, after `api_kwargs["tools"] = tools`, add:
       api_kwargs.setdefault("parallel_tool_calls", True)

A more configurable variant: add optional `temperature` and `parallel_tool_calls` fields to the custom_providers schema:
       custom_providers:
         - name: Local
           base_url: http://localhost:8080/v1
           model: <model>
           temperature: 0.2           # new
           parallel_tool_calls: true  # new

Happy to send a PR if the maintainers prefer one approach over the other.


### Are you willing to submit a PR for this?

- [x] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Custom OpenAI-compatible providers: temperature and parallel_tool_calls request fields not propagated #18470

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: Custom OpenAI-compatible providers: temperature and parallel_tool_calls request fields not propagated #18470

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions