Skip to content

feat: custom_providers should support per-provider max_tokens override #28782

@pty819

Description

@pty819

Problem

Currently, model.max_tokens in config.yaml is a global setting. When set, it applies to all providers including the fallback chain. There is no way to specify a per-provider max_tokens override, unlike context_length which already supports per-provider overrides via custom_providers[].models.<model>.context_length.

This is problematic when:

  • A custom provider (e.g. Ark DeepSeek) needs an explicit max_tokens because auto-detection doesn't work
  • Fallback providers (e.g. MiniMax, NVIDIA) should NOT inherit that same max_tokens value

Current workaround

Putting max_tokens in model: makes it global — every provider including fallbacks sends max_tokens=131072 in every API call. The only way to avoid this today is to leave max_tokens unset entirely and accept whatever default each provider chooses.

Proposed solution

Add max_tokens support to custom_providers[].models.<model>.max_tokens, following the exact same pattern as the existing context_length override:

custom_providers:
  - name: My Provider
    base_url: https://...
    api_key: ...
    model: my-model
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # new field, per-provider

Implementation scope

  1. hermes_cli/config.py — Add get_custom_provider_max_tokens() function parallel to get_custom_provider_context_length().
  2. agent/agent_init.py — After the existing model.max_tokens fallback (around line 1166), add a second fallback that checks custom_providers for a per-provider max_tokens when agent.max_tokens is still None.
  3. hermes_cli/main.py — Optionally update _save_custom_provider to save max_tokens into models.<model>.max_tokens.

Priority

Medium. Not a bug (everything works without it), but a missing feature that causes real confusion — users who set model.max_tokens expecting it to only affect their primary provider may inadvertently pollute their fallback API calls.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havearea/configConfig system, migrations, profilescomp/agentCore agent loop, run_agent.py, prompt buildercomp/cliCLI entry point, hermes_cli/, setup wizardtype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions