Skip to content

fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086

Closed
akhater wants to merge 21 commits into
NousResearch:mainfrom
akhater:fix/azure-openai-gpt5-routing
Closed

fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086
akhater wants to merge 21 commits into
NousResearch:mainfrom
akhater:fix/azure-openai-gpt5-routing

Conversation

@akhater

@akhater akhater commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Azure OpenAI exposes an OpenAI-compatible endpoint at {resource}.openai.azure.com/openai/v1 that works with the standard openai Python client. Two issues prevented gpt-5.x models from working on this endpoint:

  • max_tokens rejected: _max_tokens_param() only sent max_completion_tokens for api.openai.com URLs. Azure gpt-5.x also requires max_completion_tokens.
  • Wrong API path: _model_requires_responses_api() correctly detected gpt-5.x and upgraded to codex_responses mode. But Azure does NOT support the Responses API — it serves gpt-5.x on /chat/completions, causing a 404.

Changes

Adds _is_azure_openai_url() (matches openai.azure.com) and uses it in two places:

  1. _max_tokens_param() — returns max_completion_tokens for Azure (same as direct OpenAI)
  2. codex_responses upgrade gate — skips Azure so gpt-5.x stays on chat_completions where Azure actually serves it

gpt-4.x models on Azure are unaffected (already worked via chat_completions + max_tokens).

Test plan

  • All existing TestMaxTokensParam tests pass
  • test_returns_max_completion_tokens_for_azure — Azure URL returns correct param
  • test_azure_gpt5_stays_on_chat_completions — Azure + gpt-5.x does not upgrade to codex_responses
  • test_non_azure_gpt5_upgrades_to_codex_responses — non-Azure gpt-5.x still upgrades (no regression)
  • Live-tested against {resource}.openai.azure.com/openai/v1 with gpt-5.4-mini and gpt-4.1-mini

🤖 Generated with Claude Code

Ubuntu and others added 21 commits April 9, 2026 19:50
Add common data formats (JSON, YAML, CSV, XML, HTML) and source code
files (Python, JavaScript, TypeScript, Shell, SQL) to the gateway
document upload allowlist.

Users working with agents via Telegram, Discord, and other messaging
platforms frequently need to share configuration files, data exports,
and code snippets as file uploads. The current allowlist is limited to
office documents and plain text, forcing users to rename files or paste
content inline as a workaround.
The Telegram /model slash command picker read provider info from the
`providers:` dict schema only, so user-defined endpoints configured via
the `custom_providers:` list (the format written by `hermes model`) were
invisible in the picker and clicking them failed with "Unknown provider
'custom'".

Changes:
- list_authenticated_providers() accepts a custom_providers list and
  collapses all entries sharing the same base_url into a single "custom"
  provider that exposes every configured model as a button.
- gateway/run.py reads cfg.get("custom_providers") and forwards it to
  both list_authenticated_providers() call sites.
- switch_model() PATH A synthesizes a ProviderDef on the fly when
  --provider custom is passed and the runtime is already on a custom
  endpoint, so picker-triggered switches reuse the active base_url /
  api_key instead of failing provider resolution.

Result: with multiple models configured under custom_providers:, the
/model picker shows one "Custom endpoint" provider with a button per
model, and clicking a button switches cleanly without re-authentication.
The /model picker (list_authenticated_providers) walks every provider
whose env var is set and adds it to the picker regardless of whether
there are any models to show. This breaks down when an API key is set
for a non-LLM feature — e.g. setting GROQ_API_KEY to enable Groq
Whisper STT makes Groq appear in the LLM picker as "0 models", which
users can't click and which clutters the list.

Skip any provider whose curated model list is empty. User-defined and
custom_providers entries are unaffected since they already gate on
having at least one model configured.
Some Ollama-served models (MiniMax M2.7, Kimi K2.5) emit tool calls in
Anthropic's XML format (<invoke name="..."><parameter name="...">)
instead of OpenAI structured tool_calls. The main turn loop reads
response.choices[0].message directly and never reached the XML, so tools
like send_message silently failed — the model "thought" it called them.

Add a three-gated fallback parser between content normalization and the
plugin hook (run_agent.py ~8685). Gates:
  1. Structural: skipped entirely for codex_responses and anthropic_messages
  2. Empty: only runs when tool_calls is None/empty
  3. Substring: only when "<invoke" appears in the text content

Parses each <invoke name="..."><parameter name="...">...</parameter></invoke>
block into a SimpleNamespace matching the shape the downstream dispatcher
(_execute_tool_calls) expects: .id, .function.name, .function.arguments
(JSON string). Strips the raw XML from visible content afterward.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-mapping

_get_platform_tools() uses a reverse-mapping loop to infer which toolsets
are enabled when no explicit platform_toolsets config exists. The loop
iterated only over CONFIGURABLE_TOOLSETS, silently dropping any toolset
not listed there — including "messaging", which is the toolset containing
send_message.

send_message is in _HERMES_CORE_TOOLS and is fully present in the
hermes-telegram composite toolset, but because "messaging" was not in
CONFIGURABLE_TOOLSETS, it was never added to the enabled set. The
check_fn was never even evaluated — the tool was excluded before runtime.

Fix: after the CONFIGURABLE_TOOLSETS loop, also iterate over all
toolsets defined in TOOLSETS that are neither configurable nor platform
defaults. Any whose tools are fully covered by the base composite toolset
are added to enabled_toolsets. Adds a logger.debug line for visibility.

This only affects the else-branch (no explicit saved config), so
profiles that have run `hermes tools` and saved explicit toolset lists
are unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved conflict in hermes_cli/model_switch.py:
- PATH A: kept our "custom" provider special-case + wired upstream's
  new custom_providers param into resolve_provider_full() else branch
- Section 4: kept our collapse-to-single-slug approach for custom_providers
  list (all entries appear as one "custom" entry in the picker)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop our collapsed-to-single-"custom" hack in favour of upstream's
design: each custom_providers entry gets its own slug via
custom_provider_slug(). resolve_provider_full() already handles these
slugs natively so PATH A no longer needs a special case.

Retain the if total == 0: continue guard (our NousResearch#7267 fix) so providers
with no models (e.g. Groq keyed for STT only) stay hidden.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each custom_providers entry declares one model under a named provider.
Entries sharing the same name collapse into a single provider row in the
/model picker — e.g. four Ollama Cloud models appear as one row.
Entries with distinct names produce separate rows (Ollama Cloud vs Moonshot).

This aligns with upstream's custom_provider_slug() convention while
fixing the UX regression where every entry became its own provider row.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When HERMES_CRON_ALLOW_MESSAGING=1, the cron scheduler allows cron agents
to call send_message as a side-effect of the job (e.g. check-ins, nudges,
external notifications). Default behavior unchanged — upstream users see
no difference unless they opt in.

Two small changes in cron/scheduler.py:

1. disabled_toolsets no longer excludes "messaging" when the flag is set,
   so send_message becomes available in the cron agent's tool list.

2. The injected cron_hint system prompt has a second variant: instead of
   forbidding send_message outright, it explains that deliver handles the
   final report while send_message is the right choice for side-effect
   messages. Reserves deliver for reports; send_message for actions.

Tested end-to-end on Orion: the mirror_to_session path in
tools/send_message_tool.py writes the outbound message to the target's
session transcript so a reply from the recipient retains conversation
context on the next turn.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_remap_path_for_user was calling .resolve() on the Python path, which
followed venv/bin/python into the base interpreter. On uv-managed venvs
this swaps the systemd ExecStart to a bare Python that has none of the
venv's site-packages, so the service crashes on first import. Classical
python -m venv installs were unaffected by accident: the resolved target
/usr/bin/python3.x lives outside $HOME so the path-remap branch was
skipped and the system Python's packages silently worked.

Remove .resolve() calls on both current_home and the path; use
.expanduser() for lexical tilde expansion only. The function does
lexical prefix substitution, which is all it needs to do for its
actual purpose (remapping /root/.hermes -> /home/<user>/.hermes when
installing system services as root for a different user).

Repro: on a uv-managed venv install, `sudo hermes gateway install
--system` writes ExecStart=.../uv/python/cpython-3.11.15-.../bin/python3.11
instead of .../hermes-agent/venv/bin/python, and the service crashes on
ModuleNotFoundError: yaml.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a new `browser_screenshot` tool that captures a raw PNG of the
current page and ships it to the user as a native chat attachment with
zero token overhead, no LLM call, and no agent-side path handling.

Motivation
----------
There is currently no clean way for an agent to deliver a literal
screenshot of a web page back to the user:

* `browser_vision` always routes the screenshot through a vision LLM
  call, which costs tokens, requires `auxiliary.vision.model` to be
  configured, and returns a description rather than the raw image. It
  *can* return a `screenshot_path` for `MEDIA:` delivery, but the agent
  has to format the tag correctly, the file has to live somewhere both
  the agent's sandbox and the gateway can resolve, and the path string
  has to survive the agent's text response intact. Any one of those can
  break.
* Hermes terminal sandboxes (Docker, Modal, Daytona, etc.) typically
  expose `~/.hermes/profiles/<name>/...` at a different absolute path
  inside the container than on the host. The agent therefore receives a
  path it cannot verify with its own `read_file` / `terminal` tools and
  often "corrects" the path to one the gateway cannot open.
* Returning the bytes inline as base64 is a non-starter at scale: a
  ~400 KB screenshot becomes ~140K tokens of input plus another ~140K
  echoed in the response, which thrashes context windows and tanks
  latency on cheaper models.

Solution
--------
Move media delivery off the agent's critical path entirely:

1. **`gateway/media_queue.py`** (new): a small thread-safe per-session
   queue with `enqueue_media(path)` and `drain_media(session_key)`. The
   queue auto-resolves the active session via the existing
   `tools.approval` ContextVar so tools running inside the agent loop
   can call `enqueue_media(path)` with no arguments.

2. **`browser_screenshot` tool** (new, in `tools/browser_tool.py` with a
   Camofox backend in `tools/browser_camofox.py`): captures the page,
   saves the PNG to a host path under the gateway's filesystem,
   enqueues the path, and returns a tiny success result with no path
   for the agent to mishandle. Falls back to the universal
   `_run_browser_command(..., "screenshot", ...)` primitive for
   non-Camofox backends so it works on every existing browser provider.

3. **Gateway drain hook** in `gateway/platforms/base.py`: right after
   the existing `local_files` send loop, the gateway calls
   `drain_media(session_key)` and ships every queued path via the
   appropriate `send_*` method based on file extension (image / voice /
   video / document). The drain runs exactly once per
   `_process_message_background` invocation, after the agent's text
   response has been sent.

Properties
----------
* **Zero token cost.** The tool result is `{"success": true, "delivered":
  true, "size_bytes": N}` — no path, no base64, no echo.
* **Agent-proof.** The agent never sees the file path, so it cannot
  hallucinate, "verify", or rewrite it. Just call the tool and write a
  normal text reply; the image attaches automatically.
* **Idempotent per turn.** `drain_media` is an atomic pop, so each
  queued item is sent exactly once per response cycle.
* **Race-free.** The queue is guarded by a `threading.Lock`.
* **Generalizes to all binary outputs.** Any future tool that wants to
  deliver an image, voice note, video, or document can call
  `enqueue_media(path)` and the same drain handles routing by
  extension. This is the same side-effect-driven delivery pattern the
  TTS tool already relies on, generalized into a transport concern
  rather than a tool concern.
* **Multi-backend.** The Camofox path uses the existing
  `/tabs/<id>/screenshot` endpoint; the fallback uses the same
  `_run_browser_command(..., "screenshot", ...)` primitive that
  `browser_vision` uses today, so any backend where `browser_vision`
  works for screenshots also works here.

Tool registration is added to `toolsets.py` (default + browser
toolset + cli + telegram lists) and `model_tools.py` (`browser_tools`
group) so the new tool is exposed wherever `browser_vision` is.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds focused tests for the new ``gateway.media_queue`` module that
backs ``browser_screenshot``'s direct-enqueue delivery. The queue lives
in the gateway process and is drained by ``gateway.platforms.base``
after each agent response, so its invariants matter more than its
surface area.

Tests cover the failure classes a reviewer would actually care about:

* ``test_enqueue_and_drain_single_item`` — happy path, queue is empty
  after drain (no double-send risk on the next response cycle).
* ``test_multiple_enqueue_preserves_fifo_order_and_atomic_drain`` —
  multiple enqueues in one session come back in insertion order, in a
  single atomic drain (no partial flushes, no duplicate sends).
* ``test_session_isolation_across_drains`` — draining one session must
  not touch another session's queue (no cross-chat leakage).
* ``test_contextvar_resolution_does_not_leak_across_sessions`` — the
  zero-arg ``enqueue_media(path)`` call path used in production by the
  browser screenshot tool resolves the active session via the
  ``tools.approval`` ContextVar, and state from one session does not
  bleed into another.
* ``test_drain_unknown_session_returns_empty_list`` — draining a
  session that was never touched is a no-op, not an error.

No mocks beyond the queue itself. Uses an autouse fixture to clear the
module-level ``_pending`` dict between tests so they cannot interfere.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ens, types params

The built-in hindsight_recall tool previously accepted only a query
string with all other parameters hardcoded from config. This made it
impossible for the agent to:

- Switch between semantic recall, keyword search, and entity graph
  lookup without shelling out to curl
- Override budget/max_tokens per query for deeper dives
- Filter by memory type (e.g. types: ["world"]) dynamically

This commit adds optional parameters to the existing tool:

  method: "recall" (default) | "list" | "entity"
  budget: "low" | "mid" | "high"
  max_tokens: integer
  types: string[]

All parameters respect user overrides consistently across all three
methods, falling back to sensible defaults:

  recall:  budget from config, max_tokens from config, types from config
  list:    keyword search via client.list_memories(), no budget/types
  entity:  budget defaults to "high", types defaults to ["world"],
           include_entities=True for entity graph data

No direct HTTP calls. All methods use the official hindsight-client
Python library, compatible with both local and cloud deployments.

The tool description encodes a minimal decision heuristic so the agent
learns when to escalate:

  "Use default recall first. If results are missing or too vague:
   use method list for exact keyword/name matches, method entity for
   relationship queries, or increase budget / set types [world] for
   deeper retrieval."

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extract_media regex only matched image/video/audio extensions
(png, jpg, mp4, ogg, etc). Document types like .docx, .pdf, .xlsx
were silently ignored, making it impossible for agents to send
document attachments via Telegram's send_document API.

Added: pdf, doc/docx, xls/xlsx, ppt/pptx, csv, txt, zip, tar.gz, json

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When agents run inside Docker containers, they emit MEDIA:/workspace/file.docx
but send_message runs on the host where /workspace doesn't exist. The file
is actually at the host-side volume mount (e.g. ~/.hermes/profiles/<name>/workspace/).

Added _resolve_container_path() that reads docker_volumes from the profile
config and reverses the host:container mapping. Applied in both send_message
tool and gateway response handler.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. `_model_requires_responses_api()` correctly detected gpt-5.x but the
   routing gate unconditionally upgraded to `codex_responses` mode.
   Azure does NOT support the Responses API — it serves gpt-5.x on the
   regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` mode where Azure actually serves it.
- Three new tests cover Azure max_tokens routing and api_mode behaviour.

gpt-4.x models on Azure are unaffected (already used chat_completions
+ max_tokens, which Azure accepts for those models).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
teknium1 pushed a commit that referenced this pull request Apr 26, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #15845 along with the other open Azure PRs as one consolidated salvage + an auto-detection feature on top (URL sniff + /models probe + Anthropic Messages fallback + context-length resolution).

Your commits were cherry-picked with authorship preserved (ac5711428 on main). Thanks @akhater — this landed because of your work.

@teknium1 teknium1 closed this Apr 26, 2026
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets labels Apr 26, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Superseded by #15845 which merged — consolidates fixes from #9029, #4599, #10086, and #8766.

ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:

1. `_max_tokens_param()` only sent `max_completion_tokens` for
   `api.openai.com` URLs. Azure also requires `max_completion_tokens`
   for gpt-5.x models.

2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
   to Responses API. Azure does NOT support the Responses API — it serves
   gpt-5.x on the regular `/chat/completions` path, causing a 404.

Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
  `chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
  on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.

gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).

Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants