Summary
When fallback_model is configured with provider: custom and an explicit api_key in config.yaml, the API key is not reliably passed through to the actual HTTP request, resulting in a 400 INVALID_ARGUMENT: API key not valid error from the fallback provider.
Config
fallback_model:
provider: custom
model: gemini-flash-lite-latest
base_url: https://generativelanguage.googleapis.com/v1beta/openai
api_key: <google-api-key>
Observed behavior
🔄 Primary model failed — switching to fallback: gemini-flash-lite-latest via custom
⚠️ API call failed (attempt 1/3): BadRequestError [HTTP 400]
🔌 Provider: custom Model: gemini-flash-lite-latest
🌐 Endpoint: https://generativelanguage.googleapis.com/v1beta/openai/
📝 Error: HTTP 400: Error code: 400 - API key not valid. Please pass a valid API key.
Expected behavior
The API key specified in fallback_model.api_key should be passed as the Authorization: Bearer token to the custom endpoint.
Investigation
Traced through the full code path:
_load_fallback_model (gateway/run.py:1344) reads config.yaml via yaml.safe_load — api_key is present in the returned dict. ✓
_try_activate_fallback (run_agent.py:6316) reads fb.get("api_key") and assigns to fb_api_key_hint. ✓ in isolation.
resolve_provider_client("custom", ..., explicit_api_key=fb_api_key_hint) (auxiliary_client.py:1546) is supposed to build the OpenAI client with api_key=custom_key. ✓ in isolation.
- Direct Python simulation of the full chain (load .env → resolve_provider_client → chat.completions.create) succeeds every time.
- curl against
https://generativelanguage.googleapis.com/v1beta/openai/chat/completions with the same key and Bearer auth succeeds.
Despite every isolated simulation working, the live fallback path consistently sends an invalid/missing key. Gateway restarts between each attempt confirmed config was current.
Additional findings
provider: gemini (named provider) always returns HTTP 404: models.dev provides v1beta/models/ as the base URL; hermes uses OpenAI-compat transport and appends /chat/completions → v1beta/models/chat/completions which does not exist. The correct OpenAI-compat base URL is v1beta/openai.
api_key_env field in fallback_model is silently ignored: resolve_provider_client for provider: custom only checks explicit_api_key and then falls back to OPENAI_API_KEY — it never reads api_key_env from the fallback config dict.
Environment
- Hermes Agent (gateway service, systemd user unit)
- Primary model:
gpt-5.4-mini via openai-codex (OAuth)
- Fallback: Google AI Studio (
v1beta/openai OpenAI-compatible endpoint)
- Platform: Linux / WSL2
Suggested fixes
-
provider: gemini 404: Update models.dev entry or add a hermes overlay for gemini provider with base_url_override = "https://generativelanguage.googleapis.com/v1beta/openai" and api_key_env_var = "GOOGLE_API_KEY" so the named provider just works.
-
api_key not flowing through on provider: custom: Add debug logging in _try_activate_fallback to print fb_api_key_hint before passing to resolve_provider_client. The discrepancy between simulation (works) and live session (fails) suggests either a thread-safety issue with self._fallback_chain mutation, or that _client_kwargs is being reset between fallback activation and the first retry attempt.
-
api_key_env ignored: In resolve_provider_client for provider: custom, add handling for api_key_env analogous to how it works for auxiliary tasks — i.e., if explicit_api_key is empty, check if the fallback config has an api_key_env field and call os.getenv() on it.
Summary
When
fallback_modelis configured withprovider: customand an explicitapi_keyinconfig.yaml, the API key is not reliably passed through to the actual HTTP request, resulting in a400 INVALID_ARGUMENT: API key not validerror from the fallback provider.Config
Observed behavior
Expected behavior
The API key specified in
fallback_model.api_keyshould be passed as theAuthorization: Bearertoken to the custom endpoint.Investigation
Traced through the full code path:
_load_fallback_model(gateway/run.py:1344) readsconfig.yamlviayaml.safe_load—api_keyis present in the returned dict. ✓_try_activate_fallback(run_agent.py:6316) readsfb.get("api_key")and assigns tofb_api_key_hint. ✓ in isolation.resolve_provider_client("custom", ..., explicit_api_key=fb_api_key_hint)(auxiliary_client.py:1546) is supposed to build the OpenAI client withapi_key=custom_key. ✓ in isolation.https://generativelanguage.googleapis.com/v1beta/openai/chat/completionswith the same key and Bearer auth succeeds.Despite every isolated simulation working, the live fallback path consistently sends an invalid/missing key. Gateway restarts between each attempt confirmed config was current.
Additional findings
provider: gemini(named provider) always returns HTTP 404: models.dev providesv1beta/models/as the base URL; hermes uses OpenAI-compat transport and appends/chat/completions→v1beta/models/chat/completionswhich does not exist. The correct OpenAI-compat base URL isv1beta/openai.api_key_envfield infallback_modelis silently ignored:resolve_provider_clientforprovider: customonly checksexplicit_api_keyand then falls back toOPENAI_API_KEY— it never readsapi_key_envfrom the fallback config dict.Environment
gpt-5.4-miniviaopenai-codex(OAuth)v1beta/openaiOpenAI-compatible endpoint)Suggested fixes
provider: gemini404: Update models.dev entry or add a hermes overlay forgeminiprovider withbase_url_override = "https://generativelanguage.googleapis.com/v1beta/openai"andapi_key_env_var = "GOOGLE_API_KEY"so the named provider just works.api_keynot flowing through onprovider: custom: Add debug logging in_try_activate_fallbackto printfb_api_key_hintbefore passing toresolve_provider_client. The discrepancy between simulation (works) and live session (fails) suggests either a thread-safety issue withself._fallback_chainmutation, or that_client_kwargsis being reset between fallback activation and the first retry attempt.api_key_envignored: Inresolve_provider_clientforprovider: custom, add handling forapi_key_envanalogous to how it works for auxiliary tasks — i.e., ifexplicit_api_keyis empty, check if the fallback config has anapi_key_envfield and callos.getenv()on it.