Summary
The API server platform (gateway/platforms/api_server.py) always uses the global model.default — there is no way to run the API server on a different (e.g. cheaper/faster) model than the rest of the gateway. _resolve_gateway_model() ignores any per-platform configuration.
This is a feature gap rather than a crash: operators who want, say, the HTTP API server on Sonnet while CLI/Discord stay on Opus have no supported knob.
Current behaviour
gateway/platforms/api_server.py::APIServerAdapter._create_agent resolves the model with:
model = _resolve_gateway_model()
_resolve_gateway_model(config=None) (in gateway/run.py) only ever reads model.default / model.model. There is no platform dimension, so every gateway platform that constructs a temporary agent shares one model.
Proposed fix
Add an opt-in platform parameter to _resolve_gateway_model(). When supplied and platform_models.<platform> exists in config.yaml, that model wins over model.default. Every existing call site omits the argument and is byte-for-byte unchanged — only api_server._create_agent opts in.
Config shape (additive, optional):
model:
default: claude-opus-4-8
platform_models:
api_server:
default: claude-sonnet-4-6 # or a bare string: api_server: claude-sonnet-4-6
Note: provider/credentials still come from the global runtime config, so the override must name a model compatible with the active provider. (A future enhancement could thread a per-platform provider too.)
Diff
diff --git a/gateway/platforms/api_server.py b/gateway/platforms/api_server.py
index a18630f85..92585e6bd 100644
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@@ -963,9 +963,9 @@ class APIServerAdapter(BasePlatformAdapter):
runtime_kwargs = _resolve_runtime_agent_kwargs()
reasoning_config = GatewayRunner._load_reasoning_config()
- model = _resolve_gateway_model()
user_config = _load_gateway_config()
+ model = _resolve_gateway_model(user_config, platform="api_server")
enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
diff --git a/gateway/run.py b/gateway/run.py
index a2e41c609..f82a276a1 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -1443,14 +1443,39 @@ def _load_gateway_runtime_config() -> dict:
return expanded if isinstance(expanded, dict) else {}
-def _resolve_gateway_model(config: dict | None = None) -> str:
+def _resolve_gateway_model(config: dict | None = None, platform: str | None = None) -> str:
"""Read model from config.yaml — single source of truth.
Without this, temporary AIAgent instances (e.g. /compress) fall
back to the hardcoded default which fails when the active provider is
openai-codex.
+
+ Per-platform override (opt-in): when ``platform`` is supplied AND
+ ``platform_models.<platform>`` is set in config.yaml, that model wins
+ over the global ``model.default``. This lets a single platform (e.g.
+ the API server) run a cheaper/faster model without affecting any other
+ platform. Callers that omit ``platform`` — every existing call site —
+ are completely unaffected and resolve the global default as before.
+
+ The override value may be a bare model string, or a mapping with a
+ ``default`` (or ``model``) key. Any ``provider`` key in the mapping is
+ NOT consumed here — provider/credentials still come from the global
+ runtime config, so a platform override must name a model that works
+ with the active provider.
"""
cfg = config if config is not None else _load_gateway_config()
+
+ if platform:
+ platform_models = cfg.get("platform_models")
+ if isinstance(platform_models, dict):
+ override = platform_models.get(platform)
+ if isinstance(override, str) and override:
+ return override
+ if isinstance(override, dict):
+ model = override.get("default") or override.get("model")
+ if model:
+ return model
+
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, str):
return model_cfg
Tests
8 new regression tests in tests/test_empty_model_fallback.py::TestResolveGatewayModelPlatformOverride covering opt-in isolation (no platform= → global default), matching/non-matching platforms, bare-string and dict override shapes, and empty/missing/malformed platform_models. One existing monkeypatch in tests/gateway/test_api_server.py was widened from lambda: to lambda *a, **k: to accept the new optional arg.
Result with the fix: full tests/gateway/test_api_server.py, tests/gateway/test_api_server_toolset.py, and tests/test_empty_model_fallback.py pass (182 passed), and the other _resolve_gateway_model consumers (compress/fast/discord/session_info — 31 tests) are unaffected.
(Heads up: that test suite leaks file descriptors via aiohttp test apps and hits OSError: [Errno 24] Too many open files under a low ulimit -n. Raising ulimit -n 4096 makes it green; unrelated to this change but worth a separate look.)
Environment
- Hermes Agent, local checkout of
main
- Python 3.11
- macOS 26.5
Summary
The API server platform (
gateway/platforms/api_server.py) always uses the globalmodel.default— there is no way to run the API server on a different (e.g. cheaper/faster) model than the rest of the gateway._resolve_gateway_model()ignores any per-platform configuration.This is a feature gap rather than a crash: operators who want, say, the HTTP API server on Sonnet while CLI/Discord stay on Opus have no supported knob.
Current behaviour
gateway/platforms/api_server.py::APIServerAdapter._create_agentresolves the model with:_resolve_gateway_model(config=None)(ingateway/run.py) only ever readsmodel.default/model.model. There is no platform dimension, so every gateway platform that constructs a temporary agent shares one model.Proposed fix
Add an opt-in
platformparameter to_resolve_gateway_model(). When supplied andplatform_models.<platform>exists inconfig.yaml, that model wins overmodel.default. Every existing call site omits the argument and is byte-for-byte unchanged — onlyapi_server._create_agentopts in.Config shape (additive, optional):
Note: provider/credentials still come from the global runtime config, so the override must name a model compatible with the active provider. (A future enhancement could thread a per-platform provider too.)
Diff
Tests
8 new regression tests in
tests/test_empty_model_fallback.py::TestResolveGatewayModelPlatformOverridecovering opt-in isolation (noplatform=→ global default), matching/non-matching platforms, bare-string and dict override shapes, and empty/missing/malformedplatform_models. One existing monkeypatch intests/gateway/test_api_server.pywas widened fromlambda:tolambda *a, **k:to accept the new optional arg.Result with the fix: full
tests/gateway/test_api_server.py,tests/gateway/test_api_server_toolset.py, andtests/test_empty_model_fallback.pypass (182 passed), and the other_resolve_gateway_modelconsumers (compress/fast/discord/session_info — 31 tests) are unaffected.(Heads up: that test suite leaks file descriptors via aiohttp test apps and hits
OSError: [Errno 24] Too many open filesunder a lowulimit -n. Raisingulimit -n 4096makes it green; unrelated to this change but worth a separate look.)Environment
main