Skip to content

[Bug]Provider/Model Automatic Switch After Gateway Restart: API Key Configuration Mismatch Causes Silent 404 Failure #19286

@veawho

Description

@veawho

Problem Description

While developing the Gemini Image Gen plugin for Hermes, the system automatically switched the active Provider/Model after a configuration update and Gateway restart.

Steps to Reproduce

  1. Complete development and validation of the Gemini Image Gen plugin
  2. Register the plugin and provide the Google API Key using natural language — Hermes confirms the key was written to .env and prompts for Gateway restart
  3. Execute systemctl restart hermes-gateway
  4. After Gateway restart, the system does not load only the new API Key — it unexpectedly switches the default Provider or Model

Error Log

ERROR: API call failed after 3 retries. HTTP 404: 404 page not found
provider=openrouter model=anthropic/claude-sonnet-4-6-20250514

Root Cause Analysis

  1. TaskRouter Auto-Routing: TaskRouter (task_router.py) automatically selects the best Provider/Model
  2. Wrong Routing Target: TaskRouter's configuration routed requests to OpenRouter's claude-sonnet-4-6-20250514 model
  3. Missing API Key: No OpenRouter API Key was configured, causing 404 failure
  4. Silent Failure: The 404 was captured by error handling, triggering fallback/retry instead of alerting the user

Root Cause

A Provider being "configured" and "available" are two different things:

  • TaskRouter believes openrouter is available → routes to it
  • openrouter has no API key → 404 silent failure
  • Agent continues with wrong configuration, triggering fallback/switch

Current Fix (Implemented)

1. agent/auxiliary_client.py — Disabled OpenRouter fallback

python
("openrouter", try_openrouter), # DISABLED

2. agent/task_router.py — Replaced all Provider lists

  • "openrouter""minimax-cn"
  • openrouter,minimax-cn,

Prevention Recommendations

1. Gateway Startup: Validate Provider Availability

python
for provider in ['openrouter', 'gemini', 'minimax-cn', 'anthropic']:
key = os.environ.get(f'{provider.upper()}_API_KEY')
if not key:
logger.warning(f"Provider '{provider}' has no API key — skipping from routing")

2. TaskRouter: Filter Providers Without Valid Keys

python
def _get_available_providers():
return [p for p in ALL_PROVIDERS if _has_valid_key(p)]

3. Validate Key Immediately After Writing to .env

bash
hermes doctor --test-key google

4. Improve Prompt Messages

After writing to .env, suggest key validation before prompting restart.

5. Distinguish 404 from Missing Key vs. Resource Not Found

Surface "API key missing" errors to the user immediately instead of triggering retry/fallback.


Labels: bug, provider, configuration

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existsarea/configConfig system, migrations, profilescomp/agentCore agent loop, run_agent.py, prompt builderprovider/openrouterOpenRouter aggregatorsweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions