Skip to content

feat: per-provider max_tokens via custom_providers models.<model>.max_tokens#28786

Open
pty819 wants to merge 1 commit into
NousResearch:mainfrom
pty819:feat/per-provider-max-tokens
Open

feat: per-provider max_tokens via custom_providers models.<model>.max_tokens#28786
pty819 wants to merge 1 commit into
NousResearch:mainfrom
pty819:feat/per-provider-max-tokens

Conversation

@pty819

@pty819 pty819 commented May 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds per-provider max_tokens support to custom_providers, mirroring the existing context_length pattern. This allows users to set a provider-scoped output-token cap without affecting fallback providers.

Problem

model.max_tokens in config.yaml is global — it applies to all providers including fallbacks. There is no way to scope max_tokens to a specific provider, unlike context_length which already supports per-provider overrides.

Changes

hermes_cli/config.py

  • Adds get_custom_provider_max_tokens() — mirrors get_custom_provider_context_length() exactly. Matches by base_url + model name, returns models.<model>.max_tokens if present and valid.
  • Adds "max_tokens" to _KNOWN_KEYS in _normalize_custom_provider_entry so the top-level key is not flagged as unknown (defensive, for users who had it at top level).

agent/agent_init.py

  • After the global model.max_tokens fallback (and before context_length resolution), adds a second fallback that checks custom_providers for a per-provider max_tokens when agent.max_tokens is still None.

Usage

custom_providers:
  - name: My Provider
    base_url: https://example.com/v1
    api_key: ...
    model: my-model
    api_mode: chat_completions
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # ← new per-provider field

Related

Closes #28782

…model>.max_tokens

Adds a new lookup function get_custom_provider_max_tokens() parallel to
the existing get_custom_provider_context_length(). When model.max_tokens
is unset globally, the agent init path now falls back to checking
custom_providers for a per-provider max_tokens override, matched by
base_url + model name.

This allows users to set a provider-scoped output-token cap without
affecting fallback providers in the chain.

Closes NousResearch#28782
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard labels May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: custom_providers should support per-provider max_tokens override

2 participants