fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086
Closed
akhater wants to merge 21 commits into
Closed
fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint#10086akhater wants to merge 21 commits into
akhater wants to merge 21 commits into
Conversation
Add common data formats (JSON, YAML, CSV, XML, HTML) and source code files (Python, JavaScript, TypeScript, Shell, SQL) to the gateway document upload allowlist. Users working with agents via Telegram, Discord, and other messaging platforms frequently need to share configuration files, data exports, and code snippets as file uploads. The current allowlist is limited to office documents and plain text, forcing users to rename files or paste content inline as a workaround.
The Telegram /model slash command picker read provider info from the
`providers:` dict schema only, so user-defined endpoints configured via
the `custom_providers:` list (the format written by `hermes model`) were
invisible in the picker and clicking them failed with "Unknown provider
'custom'".
Changes:
- list_authenticated_providers() accepts a custom_providers list and
collapses all entries sharing the same base_url into a single "custom"
provider that exposes every configured model as a button.
- gateway/run.py reads cfg.get("custom_providers") and forwards it to
both list_authenticated_providers() call sites.
- switch_model() PATH A synthesizes a ProviderDef on the fly when
--provider custom is passed and the runtime is already on a custom
endpoint, so picker-triggered switches reuse the active base_url /
api_key instead of failing provider resolution.
Result: with multiple models configured under custom_providers:, the
/model picker shows one "Custom endpoint" provider with a button per
model, and clicking a button switches cleanly without re-authentication.
…ay file uploads
The /model picker (list_authenticated_providers) walks every provider whose env var is set and adds it to the picker regardless of whether there are any models to show. This breaks down when an API key is set for a non-LLM feature — e.g. setting GROQ_API_KEY to enable Groq Whisper STT makes Groq appear in the LLM picker as "0 models", which users can't click and which clutters the list. Skip any provider whose curated model list is empty. User-defined and custom_providers entries are unaffected since they already gate on having at least one model configured.
Some Ollama-served models (MiniMax M2.7, Kimi K2.5) emit tool calls in Anthropic's XML format (<invoke name="..."><parameter name="...">) instead of OpenAI structured tool_calls. The main turn loop reads response.choices[0].message directly and never reached the XML, so tools like send_message silently failed — the model "thought" it called them. Add a three-gated fallback parser between content normalization and the plugin hook (run_agent.py ~8685). Gates: 1. Structural: skipped entirely for codex_responses and anthropic_messages 2. Empty: only runs when tool_calls is None/empty 3. Substring: only when "<invoke" appears in the text content Parses each <invoke name="..."><parameter name="...">...</parameter></invoke> block into a SimpleNamespace matching the shape the downstream dispatcher (_execute_tool_calls) expects: .id, .function.name, .function.arguments (JSON string). Strips the raw XML from visible content afterward. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-mapping _get_platform_tools() uses a reverse-mapping loop to infer which toolsets are enabled when no explicit platform_toolsets config exists. The loop iterated only over CONFIGURABLE_TOOLSETS, silently dropping any toolset not listed there — including "messaging", which is the toolset containing send_message. send_message is in _HERMES_CORE_TOOLS and is fully present in the hermes-telegram composite toolset, but because "messaging" was not in CONFIGURABLE_TOOLSETS, it was never added to the enabled set. The check_fn was never even evaluated — the tool was excluded before runtime. Fix: after the CONFIGURABLE_TOOLSETS loop, also iterate over all toolsets defined in TOOLSETS that are neither configurable nor platform defaults. Any whose tools are fully covered by the base composite toolset are added to enabled_toolsets. Adds a logger.debug line for visibility. This only affects the else-branch (no explicit saved config), so profiles that have run `hermes tools` and saved explicit toolset lists are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolved conflict in hermes_cli/model_switch.py: - PATH A: kept our "custom" provider special-case + wired upstream's new custom_providers param into resolve_provider_full() else branch - Section 4: kept our collapse-to-single-slug approach for custom_providers list (all entries appear as one "custom" entry in the picker) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop our collapsed-to-single-"custom" hack in favour of upstream's design: each custom_providers entry gets its own slug via custom_provider_slug(). resolve_provider_full() already handles these slugs natively so PATH A no longer needs a special case. Retain the if total == 0: continue guard (our NousResearch#7267 fix) so providers with no models (e.g. Groq keyed for STT only) stay hidden. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each custom_providers entry declares one model under a named provider. Entries sharing the same name collapse into a single provider row in the /model picker — e.g. four Ollama Cloud models appear as one row. Entries with distinct names produce separate rows (Ollama Cloud vs Moonshot). This aligns with upstream's custom_provider_slug() convention while fixing the UX regression where every entry became its own provider row. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When HERMES_CRON_ALLOW_MESSAGING=1, the cron scheduler allows cron agents to call send_message as a side-effect of the job (e.g. check-ins, nudges, external notifications). Default behavior unchanged — upstream users see no difference unless they opt in. Two small changes in cron/scheduler.py: 1. disabled_toolsets no longer excludes "messaging" when the flag is set, so send_message becomes available in the cron agent's tool list. 2. The injected cron_hint system prompt has a second variant: instead of forbidding send_message outright, it explains that deliver handles the final report while send_message is the right choice for side-effect messages. Reserves deliver for reports; send_message for actions. Tested end-to-end on Orion: the mirror_to_session path in tools/send_message_tool.py writes the outbound message to the target's session transcript so a reply from the recipient retains conversation context on the next turn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_remap_path_for_user was calling .resolve() on the Python path, which followed venv/bin/python into the base interpreter. On uv-managed venvs this swaps the systemd ExecStart to a bare Python that has none of the venv's site-packages, so the service crashes on first import. Classical python -m venv installs were unaffected by accident: the resolved target /usr/bin/python3.x lives outside $HOME so the path-remap branch was skipped and the system Python's packages silently worked. Remove .resolve() calls on both current_home and the path; use .expanduser() for lexical tilde expansion only. The function does lexical prefix substitution, which is all it needs to do for its actual purpose (remapping /root/.hermes -> /home/<user>/.hermes when installing system services as root for a different user). Repro: on a uv-managed venv install, `sudo hermes gateway install --system` writes ExecStart=.../uv/python/cpython-3.11.15-.../bin/python3.11 instead of .../hermes-agent/venv/bin/python, and the service crashes on ModuleNotFoundError: yaml. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds a new `browser_screenshot` tool that captures a raw PNG of the
current page and ships it to the user as a native chat attachment with
zero token overhead, no LLM call, and no agent-side path handling.
Motivation
----------
There is currently no clean way for an agent to deliver a literal
screenshot of a web page back to the user:
* `browser_vision` always routes the screenshot through a vision LLM
call, which costs tokens, requires `auxiliary.vision.model` to be
configured, and returns a description rather than the raw image. It
*can* return a `screenshot_path` for `MEDIA:` delivery, but the agent
has to format the tag correctly, the file has to live somewhere both
the agent's sandbox and the gateway can resolve, and the path string
has to survive the agent's text response intact. Any one of those can
break.
* Hermes terminal sandboxes (Docker, Modal, Daytona, etc.) typically
expose `~/.hermes/profiles/<name>/...` at a different absolute path
inside the container than on the host. The agent therefore receives a
path it cannot verify with its own `read_file` / `terminal` tools and
often "corrects" the path to one the gateway cannot open.
* Returning the bytes inline as base64 is a non-starter at scale: a
~400 KB screenshot becomes ~140K tokens of input plus another ~140K
echoed in the response, which thrashes context windows and tanks
latency on cheaper models.
Solution
--------
Move media delivery off the agent's critical path entirely:
1. **`gateway/media_queue.py`** (new): a small thread-safe per-session
queue with `enqueue_media(path)` and `drain_media(session_key)`. The
queue auto-resolves the active session via the existing
`tools.approval` ContextVar so tools running inside the agent loop
can call `enqueue_media(path)` with no arguments.
2. **`browser_screenshot` tool** (new, in `tools/browser_tool.py` with a
Camofox backend in `tools/browser_camofox.py`): captures the page,
saves the PNG to a host path under the gateway's filesystem,
enqueues the path, and returns a tiny success result with no path
for the agent to mishandle. Falls back to the universal
`_run_browser_command(..., "screenshot", ...)` primitive for
non-Camofox backends so it works on every existing browser provider.
3. **Gateway drain hook** in `gateway/platforms/base.py`: right after
the existing `local_files` send loop, the gateway calls
`drain_media(session_key)` and ships every queued path via the
appropriate `send_*` method based on file extension (image / voice /
video / document). The drain runs exactly once per
`_process_message_background` invocation, after the agent's text
response has been sent.
Properties
----------
* **Zero token cost.** The tool result is `{"success": true, "delivered":
true, "size_bytes": N}` — no path, no base64, no echo.
* **Agent-proof.** The agent never sees the file path, so it cannot
hallucinate, "verify", or rewrite it. Just call the tool and write a
normal text reply; the image attaches automatically.
* **Idempotent per turn.** `drain_media` is an atomic pop, so each
queued item is sent exactly once per response cycle.
* **Race-free.** The queue is guarded by a `threading.Lock`.
* **Generalizes to all binary outputs.** Any future tool that wants to
deliver an image, voice note, video, or document can call
`enqueue_media(path)` and the same drain handles routing by
extension. This is the same side-effect-driven delivery pattern the
TTS tool already relies on, generalized into a transport concern
rather than a tool concern.
* **Multi-backend.** The Camofox path uses the existing
`/tabs/<id>/screenshot` endpoint; the fallback uses the same
`_run_browser_command(..., "screenshot", ...)` primitive that
`browser_vision` uses today, so any backend where `browser_vision`
works for screenshots also works here.
Tool registration is added to `toolsets.py` (default + browser
toolset + cli + telegram lists) and `model_tools.py` (`browser_tools`
group) so the new tool is exposed wherever `browser_vision` is.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds focused tests for the new ``gateway.media_queue`` module that backs ``browser_screenshot``'s direct-enqueue delivery. The queue lives in the gateway process and is drained by ``gateway.platforms.base`` after each agent response, so its invariants matter more than its surface area. Tests cover the failure classes a reviewer would actually care about: * ``test_enqueue_and_drain_single_item`` — happy path, queue is empty after drain (no double-send risk on the next response cycle). * ``test_multiple_enqueue_preserves_fifo_order_and_atomic_drain`` — multiple enqueues in one session come back in insertion order, in a single atomic drain (no partial flushes, no duplicate sends). * ``test_session_isolation_across_drains`` — draining one session must not touch another session's queue (no cross-chat leakage). * ``test_contextvar_resolution_does_not_leak_across_sessions`` — the zero-arg ``enqueue_media(path)`` call path used in production by the browser screenshot tool resolves the active session via the ``tools.approval`` ContextVar, and state from one session does not bleed into another. * ``test_drain_unknown_session_returns_empty_list`` — draining a session that was never touched is a no-op, not an error. No mocks beyond the queue itself. Uses an autouse fixture to clear the module-level ``_pending`` dict between tests so they cannot interfere. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ens, types params
The built-in hindsight_recall tool previously accepted only a query
string with all other parameters hardcoded from config. This made it
impossible for the agent to:
- Switch between semantic recall, keyword search, and entity graph
lookup without shelling out to curl
- Override budget/max_tokens per query for deeper dives
- Filter by memory type (e.g. types: ["world"]) dynamically
This commit adds optional parameters to the existing tool:
method: "recall" (default) | "list" | "entity"
budget: "low" | "mid" | "high"
max_tokens: integer
types: string[]
All parameters respect user overrides consistently across all three
methods, falling back to sensible defaults:
recall: budget from config, max_tokens from config, types from config
list: keyword search via client.list_memories(), no budget/types
entity: budget defaults to "high", types defaults to ["world"],
include_entities=True for entity graph data
No direct HTTP calls. All methods use the official hindsight-client
Python library, compatible with both local and cloud deployments.
The tool description encodes a minimal decision heuristic so the agent
learns when to escalate:
"Use default recall first. If results are missing or too vague:
use method list for exact keyword/name matches, method entity for
relationship queries, or increase budget / set types [world] for
deeper retrieval."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The extract_media regex only matched image/video/audio extensions (png, jpg, mp4, ogg, etc). Document types like .docx, .pdf, .xlsx were silently ignored, making it impossible for agents to send document attachments via Telegram's send_document API. Added: pdf, doc/docx, xls/xlsx, ppt/pptx, csv, txt, zip, tar.gz, json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When agents run inside Docker containers, they emit MEDIA:/workspace/file.docx but send_message runs on the host where /workspace doesn't exist. The file is actually at the host-side volume mount (e.g. ~/.hermes/profiles/<name>/workspace/). Added _resolve_container_path() that reads docker_volumes from the profile config and reverses the host:container mapping. Applied in both send_message tool and gateway response handler. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. `_model_requires_responses_api()` correctly detected gpt-5.x but the
routing gate unconditionally upgraded to `codex_responses` mode.
Azure does NOT support the Responses API — it serves gpt-5.x on the
regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` mode where Azure actually serves it.
- Three new tests cover Azure max_tokens routing and api_mode behaviour.
gpt-4.x models on Azure are unaffected (already used chat_completions
+ max_tokens, which Azure accepts for those models).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 tasks
teknium1
pushed a commit
that referenced
this pull request
Apr 26, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Contributor
|
Merged via #15845 along with the other open Azure PRs as one consolidated salvage + an auto-detection feature on top (URL sniff + Your commits were cherry-picked with authorship preserved ( |
3 tasks
Collaborator
This was referenced Apr 26, 2026
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
donald131
pushed a commit
to donald131/hermes-agent
that referenced
this pull request
May 2, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
dannyJ848
pushed a commit
to dannyJ848/hermes-agent
that referenced
this pull request
May 17, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR NousResearch#10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Azure OpenAI exposes an OpenAI-compatible endpoint at
{resource}.openai.azure.com/openai/v1that works with the standardopenaiPython client. Two issues prevented gpt-5.x models from working on this endpoint:max_tokensrejected:_max_tokens_param()only sentmax_completion_tokensforapi.openai.comURLs. Azure gpt-5.x also requiresmax_completion_tokens._model_requires_responses_api()correctly detected gpt-5.x and upgraded tocodex_responsesmode. But Azure does NOT support the Responses API — it serves gpt-5.x on/chat/completions, causing a 404.Changes
Adds
_is_azure_openai_url()(matchesopenai.azure.com) and uses it in two places:_max_tokens_param()— returnsmax_completion_tokensfor Azure (same as direct OpenAI)codex_responsesupgrade gate — skips Azure so gpt-5.x stays onchat_completionswhere Azure actually serves itgpt-4.x models on Azure are unaffected (already worked via
chat_completions+max_tokens).Test plan
TestMaxTokensParamtests passtest_returns_max_completion_tokens_for_azure— Azure URL returns correct paramtest_azure_gpt5_stays_on_chat_completions— Azure + gpt-5.x does not upgrade tocodex_responsestest_non_azure_gpt5_upgrades_to_codex_responses— non-Azure gpt-5.x still upgrades (no regression){resource}.openai.azure.com/openai/v1with gpt-5.4-mini and gpt-4.1-mini🤖 Generated with Claude Code