Skip to content

fix(chat_completions): propagate temperature and parallel_tool_calls for custom providers#18483

Open
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/issue-18470-custom-provider-temp-parallel
Open

fix(chat_completions): propagate temperature and parallel_tool_calls for custom providers#18483
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/issue-18470-custom-provider-temp-parallel

Conversation

@liuhao1024

@liuhao1024 liuhao1024 commented May 1, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

  • Add default temperature (0.2) for custom OpenAI-compatible providers to avoid llama.cpp's default of 1.0 which causes factual drift
  • Enable parallel_tool_calls by default for custom providers to allow multi-tool queries in a single turn instead of serializing into N sequential turns

Root Cause

Custom OpenAI-compatible providers (configured via custom_providers) use chat_completions transport. Two important request fields were not being propagated:

  1. temperature: The transport only sets temperature when fixed_temperature is provided (from _fixed_temperature_for_model() for specific cloud models). For custom providers, fixed_temperature is None, so no temperature field is sent. Backends like llama.cpp fall back to their default (often 1.0), causing factual drift on grounded tasks.

  2. parallel_tool_calls: The transport never sets this field for chat_completions mode (only codex.py hardcodes it). Many local backends (llama.cpp, vLLM) require this field explicitly to enable batching tool calls; without it, multi-tool queries serialize into separate turns, incurring ~4× latency.

Related Issue

N/A

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • See commit messages for detailed changes

How to Test

  1. Run pytest tests/ -q — all tests should pass
  2. Verify the specific scenario described above is resolved

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 26.4.1

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture and workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A

…for custom providers

- Set default temperature (0.2) for custom OpenAI-compatible providers to avoid llama.cpp's default of 1.0
- Enable parallel_tool_calls by default for custom providers to allow multi-tool queries in single turn
- Add regression tests for both behaviors

Fixes NousResearch#18470
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants