feat(azure-foundry): add Azure AI Foundry provider with auto-detection#15845
Merged
Conversation
…tion Add support for Azure Foundry as a new inference provider. Azure Foundry endpoints can use either OpenAI-style (/v1/chat/completions) or Anthropic-style (/v1/messages) API formats. Changes: - Add azure-foundry to PROVIDER_REGISTRY (auth.py) - Add azure-foundry overlay in HERMES_OVERLAYS (providers.py) - Add empty model list for azure-foundry (models.py) - Add _model_flow_azure_foundry() interactive setup (main.py) - Add azure-foundry runtime resolution with api_mode support (runtime_provider.py) - Add AZURE_FOUNDRY_API_KEY and AZURE_FOUNDRY_BASE_URL env vars (config.py) Usage: hermes model -> More providers -> Azure Foundry The setup wizard prompts for: - Endpoint URL - API format (OpenAI or Anthropic-style) - API key - Model name Configuration is saved to config.yaml (model.provider, model.base_url, model.api_mode, model.default) and ~/.hermes/.env (AZURE_FOUNDRY_API_KEY).
…en priority chain
…ss custom runtime when provider=anthropic + azure.com URL
…s ~/.claude/.credentials.json from overwriting Azure key mid-session
…as producing malformed URLs like /anthropic?api-version=.../v1/messages
Azure OpenAI requires an `api-version` query parameter on every request. When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`), the OpenAI SDK silently drops it during URL construction, causing 404 errors. Extract query params from base_url and pass them via `default_query` so the SDK appends them to every request. This is a generic solution that works for any custom endpoint requiring query parameters, not just Azure. No-op for URLs without query params — fully backward compatible.
Azure OpenAI exposes an OpenAI-compatible endpoint at
`{resource}.openai.azure.com/openai/v1` that accepts the standard
`openai` Python client. Two issues prevented gpt-5.x models from working:
1. `_max_tokens_param()` only sent `max_completion_tokens` for
`api.openai.com` URLs. Azure also requires `max_completion_tokens`
for gpt-5.x models.
2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x
to Responses API. Azure does NOT support the Responses API — it serves
gpt-5.x on the regular `/chat/completions` path, causing a 404.
Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs.
- `_max_tokens_param()` now returns `max_completion_tokens` for Azure.
- The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on
`chat_completions` where Azure actually serves it.
- The fallback-provider api_mode picker also recognises Azure and stays
on chat_completions.
- Tests cover max_tokens routing, api_mode behaviour, and URL detection.
gpt-4.x models on Azure are unaffected (already used chat_completions +
max_tokens, which Azure accepts for those models).
Salvage of PR #10086 — rewritten against current main where the
codex_responses upgrade gate gained copilot-acp / explicit-api_mode
exclusions.
The azure-foundry wizard now probes the endpoint before asking the user
to pick anything by hand:
1. URL path sniff — endpoints ending in /anthropic are Azure Foundry
Claude routes and skip to anthropic_messages.
2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped
model list, we switch to chat_completions and prefill the picker
with the returned deployment/model IDs.
3. Anthropic Messages probe — fallback for endpoints that don't expose
/models but do speak the Anthropic Messages shape.
4. Manual fallback — private endpoints / custom routes still work;
the user picks API mode + types a deployment name.
Context length for the selected model is resolved through the existing
agent.model_metadata.get_model_context_length chain (models.dev,
provider metadata, hardcoded family fallbacks) and stored in
model.context_length when a non-default value is found.
Also refactors runtime_provider so Azure Foundry resolution is reused
between the explicit-credentials path and the default top-level path —
previously the /v1 strip for Anthropic-style Azure only ran when the
caller passed explicit_* args, which meant config-driven sessions
hit a double-/v1 URL.
New module hermes_cli/azure_detect.py with 19 unit tests covering:
- path sniff, model ID extraction, probe fallbacks
- HTTP error handling (URLError, HTTPError)
- context-length lookup passthrough
- DEFAULT_FALLBACK_CONTEXT rejection
New runtime tests cover:
- OpenAI-style Azure Foundry
- Anthropic-style Azure Foundry with /v1 stripping
- Missing base_url / API key raising AuthError
Rationale: Microsoft confirms there's no pure-API-key endpoint to list
Azure deployments (that requires ARM management auth). The v1 Azure
OpenAI endpoint does expose /models with the resource's available
model catalog, which is good enough for picker prefill in the common
case. Users on private/gated endpoints fall through to manual entry.
- New website/docs/guides/azure-foundry.md covering both OpenAI-style and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x routing, /v1 stripping, api-version query forwarding, and the provider: anthropic + Azure URL alternative setup. - environment-variables.md picks up AZURE_FOUNDRY_API_KEY, AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY. - cli-commands.md includes azure-foundry in the provider choices list. - configuration.md lists azure-foundry among auxiliary-task providers. - sidebars.ts wires the new guide into the Guides section. - scripts/release.py AUTHOR_MAP entries for TechPrototyper, HangGlidersRule (noreply), and pein892 so the contributor-attribution CI check does not reject the salvage.
This was referenced Apr 26, 2026
turbo998
added a commit
to turbo998/hermes-agent
that referenced
this pull request
May 9, 2026
Closes parity gap between Azure Foundry (NousResearch#15845) and AWS Bedrock (NousResearch#10549). Adds the operational polish layer Bedrock established as the bar: - Azure AI Content Safety client + guardrails block (mirrors Bedrock shape) - hermes doctor Azure Foundry section - hermes auth Azure Foundry status row - Error-classifier patterns: ResponsibleAIPolicyViolation, content_filter, Azure 429 retry-after, DeploymentNotFound - /usage pricing for 7 Azure Foundry models the wizard prefills - pyproject [azure] optional extra (provider works without it) - Docs: Content Safety guardrails section in azure-foundry.md - ~70 new tests across 6 files Signed-off-by: Chen Qi <turbo998@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Azure AI Foundry (OpenAI-style and Anthropic-style endpoints) now works end-to-end. The setup wizard probes your endpoint, auto-detects the transport, prefills a picker with deployed model IDs, and resolves the context length automatically.
Salvages 4 stale Azure PRs onto current
mainwith contributor authorship preserved. Closes: #9029 (TechPrototyper), #4599 (HangGlidersRule), #10086 (akhater), #8766 (pein892).Changes
hermes_cli/azure_detect.py(new) — path sniff +/modelsprobe + Anthropic Messages fallback, wraps existingget_model_context_lengthfor context resolution.hermes_cli/main.py::_model_flow_azure_foundry— rewritten wizard: URL → key → auto-probe → model picker.hermes_cli/runtime_provider.py— new_resolve_azure_foundry_runtimehelper reused from both explicit-args and top-level paths, fixes/v1strip regression when only config.yaml is set.run_agent.py—_is_azure_openai_url()helper; gpt-5.x onopenai.azure.comstays onchat_completions(no Responses API), usesmax_completion_tokens, also applied to fallback-provider api_mode picker.agent/anthropic_adapter.py— Azure endpoints:api-version=2025-04-15passed viadefault_query(prevents malformed/anthropic?api-version=.../v1/messages). Also fixed a NameError from fix: Azure AI Foundry / Azure Anthropic endpoint compatibility #4599 where_is_azure_endpointwas referenced but never defined.agent/auxiliary_client.py,run_agent.py—_extract_url_query_params()so SDK URL-joining doesn't drop?api-version=...from custom base URLs (benefits all custom endpoints, not just Azure).azure-foundryadded toPROVIDER_REGISTRY,CANONICAL_PROVIDERS,HERMES_OVERLAYS,_PROVIDER_MODELS,OPTIONAL_ENV_VARS(AZURE_FOUNDRY_API_KEY,AZURE_FOUNDRY_BASE_URL).website/docs/guides/azure-foundry.md, env var reference, CLI--providerchoices, auxiliary provider list, sidebar entry.scripts/release.pyAUTHOR_MAP entries for TechPrototyper, HangGlidersRule (noreply form), pein892.Auto-detection
/anthropicanthropic_messages, no HTTP callGET <base>/models→ 200 + OpenAI shapechat_completions, picker prefilled with deployment IDsanthropic_messagesAzure has no pure-API-key deployment-listing endpoint (Microsoft confirms: enumeration requires ARM management auth). Azure OpenAI v1 endpoints do return the resource's available-model catalog via
GET /models, which is good enough for picker prefill in the common case.Validation
tests/hermes_cli/test_azure_detect.py(new)tests/hermes_cli/test_runtime_provider_resolution.py::TestAzureFoundryResolution(new)tests/run_agent/test_run_agent.py::TestAzureOpenAIRouting(new)tests/run_agent/test_run_agent.py::TestMaxTokensParamtests/hermes_cli/test_runtime_provider_resolution.py(full)-k anthropic)-k auxiliary)Live E2E against a local HTTP server returning an OpenAI-shaped
/modelsresponse:azure_detect.detect()correctly identifiedchat_completions, extracted 3 deployment IDs, and returnedNoneon bad-key (wizard falls back to manual). Runtime resolver stripped/v1from Anthropic-style URLs end-to-end with a realconfig.yaml+ env var.Contributors salvaged
Authorship preserved per-commit. Will merge with
--rebaseso each contributor's commit shows their name in git history.What this does NOT include
#6616 (
[SYSTEM:→[IMPORTANT:to dodge Azure content filter) is intentionally not salvaged. That's an Azure content-filter config issue for the user to resolve, not something to work around globally by neutering a prompt tag every other user relies on.