Skip to content

fix(config): preserve optional custom_providers fields on v11→v12 migration#17844

Open
zicochaos wants to merge 1 commit into
NousResearch:mainfrom
zicochaos:fix/v11-v12-preserve-custom-provider-fields
Open

fix(config): preserve optional custom_providers fields on v11→v12 migration#17844
zicochaos wants to merge 1 commit into
NousResearch:mainfrom
zicochaos:fix/v11-v12-preserve-custom-provider-fields

Conversation

@zicochaos

Copy link
Copy Markdown
Contributor

Summary

The v11→v12 migration in migrate_config() builds a new providers.<key> dict by copying only name / api / api_key / default_model / transport from the legacy custom_providers entry. Every other field that _normalize_custom_provider_entry recognises (the runtime read-side normaliser in this same module) is silently dropped on upgrade:

  • models — per-model dict (most painful loss; see below)
  • context_length — provider-level default context window
  • key_env / api_key_env — env var name for api_key
  • rate_limit_delay
  • request_timeout_seconds
  • stale_timeout_seconds

Motivation — the per-model context_length loss

A config like:

custom_providers:
  - name: my-provider
    base_url: https://example.com/v1
    api_key: ...
    api_mode: codex_responses
    model: gpt-5.4
    models:
      gpt-5.4: {context_length: 1050000}
      gpt-5.4-mini: {context_length: 400000}

migrates cleanly into providers.my-provider with the right api, transport, api_key, default_model — but loses the models: map entirely. Runtime then falls back to the 128K default for every model on this provider, silently truncating long-context conversations to 1/8 the configured window. There's no warning, no log message — the user only notices when the agent starts forgetting things mid-conversation or when the gateway emits the per-model without declared context_length warnings introduced for context-length validation.

The other dropped fields each surface their own breakage:

  • key_env: api keys sourced from env vars stop being applied; runtime receives no key for that provider.
  • context_length: provider-level default context window override is lost.
  • rate_limit_delay / *_timeout_seconds: rate-limiting and timeout overrides on slow custom endpoints are lost; some self-hosted setups rely on these for stability.

Fix

Extend the new_entry construction to copy every field listed in _normalize_custom_provider_entry's _KNOWN_KEYS, with appropriate type/value validation matching the runtime normaliser:

  • models: dict (preserved as-is) or list (preserved as-is — runtime normaliser converts list → dict on read)
  • context_length, request_timeout_seconds, stale_timeout_seconds: positive int only
  • rate_limit_delay: int or float, ≥ 0
  • key_env / api_key_env: non-empty string; normalised to canonical key_env on the way out (the alias is documented in website/docs/guides/azure-foundry.md)

Empty / zero / wrong-type values are skipped — they would either confuse the runtime normaliser or display as user-meaningful overrides when they are not.

Test plan

  • tests/hermes_cli/test_config.py::TestCustomProviderCompatibility::test_v11_upgrade_preserves_optional_fields — full-shape round-trip
  • tests/hermes_cli/test_config.py::TestCustomProviderCompatibility::test_v11_upgrade_normalises_api_key_env_aliasapi_key_envkey_env canonicalisation
  • tests/hermes_cli/test_config.py::TestCustomProviderCompatibility::test_v11_upgrade_skips_empty_or_invalid_optional_fields — zero-value / wrong-type / empty values must not be copied
  • Existing test_v11_upgrade_moves_custom_providers_into_providers still passes — exact-dict assertion unchanged
$ pytest tests/hermes_cli/test_config.py::TestCustomProviderCompatibility -q
======================== 8 passed in 2.33s ========================

…ration

The v11→v12 migration in migrate_config() builds a new providers.<key>
dict by copying only name / api / api_key / default_model / transport
from the legacy custom_providers entry. Every other field that
_normalize_custom_provider_entry recognises (the runtime read-side
normaliser in this same module) — models, context_length, key_env /
api_key_env, rate_limit_delay, request_timeout_seconds,
stale_timeout_seconds — is silently dropped on upgrade.

The most painful loss is per-model context_length overrides under
`models.<m>.context_length`: a config like

  custom_providers:
    - name: my-provider
      base_url: https://example.com/v1
      model: gpt-5.4
      models:
        gpt-5.4: {context_length: 1050000}

would migrate cleanly into providers.my-provider with the right
api/transport/default_model — but lose the gpt-5.4 → 1050000 entry
without warning. Runtime then falls back to the 128K default for that
model, silently truncating long-context conversations.

Other dropped fields each surface their own breakage:
- key_env: api keys sourced from env vars stop being applied; runtime
  receives no key for that provider.
- context_length: provider-level default context window override is lost.
- rate_limit_delay / *_timeout_seconds: rate-limiting and timeout
  overrides on slow custom endpoints are lost; some self-hosted setups
  rely on these for stability.

Fix: extend the new_entry construction to copy every field
_normalize_custom_provider_entry's _KNOWN_KEYS lists, with appropriate
type / value validation. Normalise api_key_env → key_env on the way
out (the alias is documented in website/docs/guides/azure-foundry.md
but the migration produced configs that needed both names to be
inspected at read time).

Adds three tests under TestCustomProviderCompatibility:
- test_v11_upgrade_preserves_optional_fields: full-shape round-trip
- test_v11_upgrade_normalises_api_key_env_alias: alias canonicalisation
- test_v11_upgrade_skips_empty_or_invalid_optional_fields: zero-value /
  wrong-type / empty values must not be copied
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/cli CLI entry point, hermes_cli/, setup wizard area/config Config system, migrations, profiles labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to merged #9016 which did an earlier v11-v12 custom_providers restoration. This PR catches additional fields (models, context_length, key_env, timeouts) that #9016 missed.

1 similar comment
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to merged #9016 which did an earlier v11-v12 custom_providers restoration. This PR catches additional fields (models, context_length, key_env, timeouts) that #9016 missed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/cli CLI entry point, hermes_cli/, setup wizard P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants