fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint by akhater · Pull Request #10086 · NousResearch/hermes-agent

akhater · 2026-04-15T05:19:02Z

Summary

Azure OpenAI exposes an OpenAI-compatible endpoint at {resource}.openai.azure.com/openai/v1 that works with the standard openai Python client. Two issues prevented gpt-5.x models from working on this endpoint:

max_tokens rejected: _max_tokens_param() only sent max_completion_tokens for api.openai.com URLs. Azure gpt-5.x also requires max_completion_tokens.
Wrong API path: _model_requires_responses_api() correctly detected gpt-5.x and upgraded to codex_responses mode. But Azure does NOT support the Responses API — it serves gpt-5.x on /chat/completions, causing a 404.

Changes

Adds _is_azure_openai_url() (matches openai.azure.com) and uses it in two places:

_max_tokens_param() — returns max_completion_tokens for Azure (same as direct OpenAI)
codex_responses upgrade gate — skips Azure so gpt-5.x stays on chat_completions where Azure actually serves it

gpt-4.x models on Azure are unaffected (already worked via chat_completions + max_tokens).

Test plan

All existing TestMaxTokensParam tests pass
test_returns_max_completion_tokens_for_azure — Azure URL returns correct param
test_azure_gpt5_stays_on_chat_completions — Azure + gpt-5.x does not upgrade to codex_responses
test_non_azure_gpt5_upgrades_to_codex_responses — non-Azure gpt-5.x still upgrades (no regression)
Live-tested against {resource}.openai.azure.com/openai/v1 with gpt-5.4-mini and gpt-4.1-mini

🤖 Generated with Claude Code

Add common data formats (JSON, YAML, CSV, XML, HTML) and source code files (Python, JavaScript, TypeScript, Shell, SQL) to the gateway document upload allowlist. Users working with agents via Telegram, Discord, and other messaging platforms frequently need to share configuration files, data exports, and code snippets as file uploads. The current allowlist is limited to office documents and plain text, forcing users to rename files or paste content inline as a workaround.

The Telegram /model slash command picker read provider info from the `providers:` dict schema only, so user-defined endpoints configured via the `custom_providers:` list (the format written by `hermes model`) were invisible in the picker and clicking them failed with "Unknown provider 'custom'". Changes: - list_authenticated_providers() accepts a custom_providers list and collapses all entries sharing the same base_url into a single "custom" provider that exposes every configured model as a button. - gateway/run.py reads cfg.get("custom_providers") and forwards it to both list_authenticated_providers() call sites. - switch_model() PATH A synthesizes a ProviderDef on the fly when --provider custom is passed and the runtime is already on a custom endpoint, so picker-triggered switches reuse the active base_url / api_key instead of failing provider resolution. Result: with multiple models configured under custom_providers:, the /model picker shows one "Custom endpoint" provider with a button per model, and clicking a button switches cleanly without re-authentication.

…ay file uploads

…model picker

The /model picker (list_authenticated_providers) walks every provider whose env var is set and adds it to the picker regardless of whether there are any models to show. This breaks down when an API key is set for a non-LLM feature — e.g. setting GROQ_API_KEY to enable Groq Whisper STT makes Groq appear in the LLM picker as "0 models", which users can't click and which clutters the list. Skip any provider whose curated model list is empty. User-defined and custom_providers entries are unaffected since they already gate on having at least one model configured.

Some Ollama-served models (MiniMax M2.7, Kimi K2.5) emit tool calls in Anthropic's XML format (<invoke name="..."><parameter name="...">) instead of OpenAI structured tool_calls. The main turn loop reads response.choices[0].message directly and never reached the XML, so tools like send_message silently failed — the model "thought" it called them. Add a three-gated fallback parser between content normalization and the plugin hook (run_agent.py ~8685). Gates: 1. Structural: skipped entirely for codex_responses and anthropic_messages 2. Empty: only runs when tool_calls is None/empty 3. Substring: only when "<invoke" appears in the text content Parses each <invoke name="..."><parameter name="...">...</parameter></invoke> block into a SimpleNamespace matching the shape the downstream dispatcher (_execute_tool_calls) expects: .id, .function.name, .function.arguments (JSON string). Strips the raw XML from visible content afterward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…-mapping _get_platform_tools() uses a reverse-mapping loop to infer which toolsets are enabled when no explicit platform_toolsets config exists. The loop iterated only over CONFIGURABLE_TOOLSETS, silently dropping any toolset not listed there — including "messaging", which is the toolset containing send_message. send_message is in _HERMES_CORE_TOOLS and is fully present in the hermes-telegram composite toolset, but because "messaging" was not in CONFIGURABLE_TOOLSETS, it was never added to the enabled set. The check_fn was never even evaluated — the tool was excluded before runtime. Fix: after the CONFIGURABLE_TOOLSETS loop, also iterate over all toolsets defined in TOOLSETS that are neither configurable nor platform defaults. Any whose tools are fully covered by the base composite toolset are added to enabled_toolsets. Adds a logger.debug line for visibility. This only affects the else-branch (no explicit saved config), so profiles that have run `hermes tools` and saved explicit toolset lists are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolved conflict in hermes_cli/model_switch.py: - PATH A: kept our "custom" provider special-case + wired upstream's new custom_providers param into resolve_provider_full() else branch - Section 4: kept our collapse-to-single-slug approach for custom_providers list (all entries appear as one "custom" entry in the picker) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Drop our collapsed-to-single-"custom" hack in favour of upstream's design: each custom_providers entry gets its own slug via custom_provider_slug(). resolve_provider_full() already handles these slugs natively so PATH A no longer needs a special case. Retain the if total == 0: continue guard (our NousResearch#7267 fix) so providers with no models (e.g. Groq keyed for STT only) stay hidden. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Each custom_providers entry declares one model under a named provider. Entries sharing the same name collapse into a single provider row in the /model picker — e.g. four Ollama Cloud models appear as one row. Entries with distinct names produce separate rows (Ollama Cloud vs Moonshot). This aligns with upstream's custom_provider_slug() convention while fixing the UX regression where every entry became its own provider row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When HERMES_CRON_ALLOW_MESSAGING=1, the cron scheduler allows cron agents to call send_message as a side-effect of the job (e.g. check-ins, nudges, external notifications). Default behavior unchanged — upstream users see no difference unless they opt in. Two small changes in cron/scheduler.py: 1. disabled_toolsets no longer excludes "messaging" when the flag is set, so send_message becomes available in the cron agent's tool list. 2. The injected cron_hint system prompt has a second variant: instead of forbidding send_message outright, it explains that deliver handles the final report while send_message is the right choice for side-effect messages. Reserves deliver for reports; send_message for actions. Tested end-to-end on Orion: the mirror_to_session path in tools/send_message_tool.py writes the outbound message to the target's session transcript so a reply from the recipient retains conversation context on the next turn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_remap_path_for_user was calling .resolve() on the Python path, which followed venv/bin/python into the base interpreter. On uv-managed venvs this swaps the systemd ExecStart to a bare Python that has none of the venv's site-packages, so the service crashes on first import. Classical python -m venv installs were unaffected by accident: the resolved target /usr/bin/python3.x lives outside $HOME so the path-remap branch was skipped and the system Python's packages silently worked. Remove .resolve() calls on both current_home and the path; use .expanduser() for lexical tilde expansion only. The function does lexical prefix substitution, which is all it needs to do for its actual purpose (remapping /root/.hermes -> /home/<user>/.hermes when installing system services as root for a different user). Repro: on a uv-managed venv install, `sudo hermes gateway install --system` writes ExecStart=.../uv/python/cpython-3.11.15-.../bin/python3.11 instead of .../hermes-agent/venv/bin/python, and the service crashes on ModuleNotFoundError: yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds a new `browser_screenshot` tool that captures a raw PNG of the current page and ships it to the user as a native chat attachment with zero token overhead, no LLM call, and no agent-side path handling. Motivation ---------- There is currently no clean way for an agent to deliver a literal screenshot of a web page back to the user: * `browser_vision` always routes the screenshot through a vision LLM call, which costs tokens, requires `auxiliary.vision.model` to be configured, and returns a description rather than the raw image. It *can* return a `screenshot_path` for `MEDIA:` delivery, but the agent has to format the tag correctly, the file has to live somewhere both the agent's sandbox and the gateway can resolve, and the path string has to survive the agent's text response intact. Any one of those can break. * Hermes terminal sandboxes (Docker, Modal, Daytona, etc.) typically expose `~/.hermes/profiles/<name>/...` at a different absolute path inside the container than on the host. The agent therefore receives a path it cannot verify with its own `read_file` / `terminal` tools and often "corrects" the path to one the gateway cannot open. * Returning the bytes inline as base64 is a non-starter at scale: a ~400 KB screenshot becomes ~140K tokens of input plus another ~140K echoed in the response, which thrashes context windows and tanks latency on cheaper models. Solution -------- Move media delivery off the agent's critical path entirely: 1. **`gateway/media_queue.py`** (new): a small thread-safe per-session queue with `enqueue_media(path)` and `drain_media(session_key)`. The queue auto-resolves the active session via the existing `tools.approval` ContextVar so tools running inside the agent loop can call `enqueue_media(path)` with no arguments. 2. **`browser_screenshot` tool** (new, in `tools/browser_tool.py` with a Camofox backend in `tools/browser_camofox.py`): captures the page, saves the PNG to a host path under the gateway's filesystem, enqueues the path, and returns a tiny success result with no path for the agent to mishandle. Falls back to the universal `_run_browser_command(..., "screenshot", ...)` primitive for non-Camofox backends so it works on every existing browser provider. 3. **Gateway drain hook** in `gateway/platforms/base.py`: right after the existing `local_files` send loop, the gateway calls `drain_media(session_key)` and ships every queued path via the appropriate `send_*` method based on file extension (image / voice / video / document). The drain runs exactly once per `_process_message_background` invocation, after the agent's text response has been sent. Properties ---------- * **Zero token cost.** The tool result is `{"success": true, "delivered": true, "size_bytes": N}` — no path, no base64, no echo. * **Agent-proof.** The agent never sees the file path, so it cannot hallucinate, "verify", or rewrite it. Just call the tool and write a normal text reply; the image attaches automatically. * **Idempotent per turn.** `drain_media` is an atomic pop, so each queued item is sent exactly once per response cycle. * **Race-free.** The queue is guarded by a `threading.Lock`. * **Generalizes to all binary outputs.** Any future tool that wants to deliver an image, voice note, video, or document can call `enqueue_media(path)` and the same drain handles routing by extension. This is the same side-effect-driven delivery pattern the TTS tool already relies on, generalized into a transport concern rather than a tool concern. * **Multi-backend.** The Camofox path uses the existing `/tabs/<id>/screenshot` endpoint; the fallback uses the same `_run_browser_command(..., "screenshot", ...)` primitive that `browser_vision` uses today, so any backend where `browser_vision` works for screenshots also works here. Tool registration is added to `toolsets.py` (default + browser toolset + cli + telegram lists) and `model_tools.py` (`browser_tools` group) so the new tool is exposed wherever `browser_vision` is. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds focused tests for the new ``gateway.media_queue`` module that backs ``browser_screenshot``'s direct-enqueue delivery. The queue lives in the gateway process and is drained by ``gateway.platforms.base`` after each agent response, so its invariants matter more than its surface area. Tests cover the failure classes a reviewer would actually care about: * ``test_enqueue_and_drain_single_item`` — happy path, queue is empty after drain (no double-send risk on the next response cycle). * ``test_multiple_enqueue_preserves_fifo_order_and_atomic_drain`` — multiple enqueues in one session come back in insertion order, in a single atomic drain (no partial flushes, no duplicate sends). * ``test_session_isolation_across_drains`` — draining one session must not touch another session's queue (no cross-chat leakage). * ``test_contextvar_resolution_does_not_leak_across_sessions`` — the zero-arg ``enqueue_media(path)`` call path used in production by the browser screenshot tool resolves the active session via the ``tools.approval`` ContextVar, and state from one session does not bleed into another. * ``test_drain_unknown_session_returns_empty_list`` — draining a session that was never touched is a no-op, not an error. No mocks beyond the queue itself. Uses an autouse fixture to clear the module-level ``_pending`` dict between tests so they cannot interfere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ens, types params The built-in hindsight_recall tool previously accepted only a query string with all other parameters hardcoded from config. This made it impossible for the agent to: - Switch between semantic recall, keyword search, and entity graph lookup without shelling out to curl - Override budget/max_tokens per query for deeper dives - Filter by memory type (e.g. types: ["world"]) dynamically This commit adds optional parameters to the existing tool: method: "recall" (default) | "list" | "entity" budget: "low" | "mid" | "high" max_tokens: integer types: string[] All parameters respect user overrides consistently across all three methods, falling back to sensible defaults: recall: budget from config, max_tokens from config, types from config list: keyword search via client.list_memories(), no budget/types entity: budget defaults to "high", types defaults to ["world"], include_entities=True for entity graph data No direct HTTP calls. All methods use the official hindsight-client Python library, compatible with both local and cloud deployments. The tool description encodes a minimal decision heuristic so the agent learns when to escalate: "Use default recall first. If results are missing or too vague: use method list for exact keyword/name matches, method entity for relationship queries, or increase budget / set types [world] for deeper retrieval." Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The extract_media regex only matched image/video/audio extensions (png, jpg, mp4, ogg, etc). Document types like .docx, .pdf, .xlsx were silently ignored, making it impossible for agents to send document attachments via Telegram's send_document API. Added: pdf, doc/docx, xls/xlsx, ppt/pptx, csv, txt, zip, tar.gz, json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When agents run inside Docker containers, they emit MEDIA:/workspace/file.docx but send_message runs on the host where /workspace doesn't exist. The file is actually at the host-side volume mount (e.g. ~/.hermes/profiles/<name>/workspace/). Added _resolve_container_path() that reads docker_volumes from the profile config and reverses the host:container mapping. Applied in both send_message tool and gateway response handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. `_model_requires_responses_api()` correctly detected gpt-5.x but the routing gate unconditionally upgraded to `codex_responses` mode. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` mode where Azure actually serves it. - Three new tests cover Azure max_tokens routing and api_mode behaviour. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x to Responses API. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` where Azure actually serves it. - The fallback-provider api_mode picker also recognises Azure and stays on chat_completions. - Tests cover max_tokens routing, api_mode behaviour, and URL detection. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Salvage of PR #10086 — rewritten against current main where the codex_responses upgrade gate gained copilot-acp / explicit-api_mode exclusions.

teknium1 · 2026-04-26T01:49:03Z

Merged via #15845 along with the other open Azure PRs as one consolidated salvage + an auto-detection feature on top (URL sniff + /models probe + Anthropic Messages fallback + context-length resolution).

Your commits were cherry-picked with authorship preserved (ac5711428 on main). Thanks @akhater — this landed because of your work.

alt-glitch · 2026-04-26T02:04:08Z

Superseded by #15845 which merged — consolidates fixes from #9029, #4599, #10086, and #8766.

Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x to Responses API. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` where Azure actually serves it. - The fallback-provider api_mode picker also recognises Azure and stays on chat_completions. - Tests cover max_tokens routing, api_mode behaviour, and URL detection. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Salvage of PR NousResearch#10086 — rewritten against current main where the codex_responses upgrade gate gained copilot-acp / explicit-api_mode exclusions.

Ubuntu and others added 21 commits April 9, 2026 19:50

Merge PR NousResearch#6787: expand supported document types for gatew…

7830a57

…ay file uploads

Merge PR NousResearch#7261: support custom_providers list schema in /…

94d95c2

…model picker

Merge PR NousResearch#7267: hide 0-model providers from /model picker

a6db377

fix: remove duplicate custom_providers param introduced by merge

430e6ad

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into local

57330b8

OnlyTerp mentioned this pull request Apr 17, 2026

docs: 72h sweep — add MCP, coding-agent, security, observability, and remote-sandbox parts (17–21) OnlyTerp/hermes-optimization-guide#6

Merged

4 tasks

teknium1 mentioned this pull request Apr 26, 2026

feat(azure-foundry): add Azure AI Foundry provider with auto-detection #15845

Merged

teknium1 closed this Apr 26, 2026

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets labels Apr 26, 2026

alt-glitch mentioned this pull request Apr 26, 2026

feat: Add Azure Foundry provider with OpenAI/Anthropic API mode selection #9029

Closed

3 tasks

This was referenced Apr 26, 2026

fix: preserve URL query params for Azure OpenAI and custom endpoints #8766

Closed

fix: Azure AI Foundry / Azure Anthropic endpoint compatibility #4599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086

fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086
akhater wants to merge 21 commits into
NousResearch:mainfrom
akhater:fix/azure-openai-gpt5-routing

akhater commented Apr 15, 2026

Uh oh!

teknium1 commented Apr 26, 2026

Uh oh!

alt-glitch commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

akhater commented Apr 15, 2026

Summary

Changes

Test plan

Uh oh!

teknium1 commented Apr 26, 2026

Uh oh!

alt-glitch commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants