feat: custom_providers should support per-provider max_tokens override

## Problem

Currently, `model.max_tokens` in `config.yaml` is a **global** setting. When set, it applies to **all providers** including the fallback chain. There is no way to specify a per-provider `max_tokens` override, unlike `context_length` which already supports per-provider overrides via `custom_providers[].models.<model>.context_length`.

This is problematic when:
- A custom provider (e.g. Ark DeepSeek) needs an explicit `max_tokens` because auto-detection doesn't work
- Fallback providers (e.g. MiniMax, NVIDIA) should NOT inherit that same `max_tokens` value

## Current workaround

Putting `max_tokens` in `model:` makes it global — every provider including fallbacks sends `max_tokens=131072` in every API call. The only way to avoid this today is to leave `max_tokens` unset entirely and accept whatever default each provider chooses.

## Proposed solution

Add `max_tokens` support to `custom_providers[].models.<model>.max_tokens`, following the exact same pattern as the existing `context_length` override:

```yaml
custom_providers:
  - name: My Provider
    base_url: https://...
    api_key: ...
    model: my-model
    models:
      my-model:
        context_length: 1000000
        max_tokens: 131072       # new field, per-provider
```

### Implementation scope

1. **`hermes_cli/config.py`** — Add `get_custom_provider_max_tokens()` function parallel to `get_custom_provider_context_length()`.
2. **`agent/agent_init.py`** — After the existing `model.max_tokens` fallback (around line 1166), add a second fallback that checks `custom_providers` for a per-provider `max_tokens` when `agent.max_tokens` is still `None`.
3. **`hermes_cli/main.py`** — Optionally update `_save_custom_provider` to save `max_tokens` into `models.<model>.max_tokens`.

### Priority

Medium. Not a bug (everything works without it), but a missing feature that causes real confusion — users who set `model.max_tokens` expecting it to only affect their primary provider may inadvertently pollute their fallback API calls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: custom_providers should support per-provider max_tokens override #28782

Problem

Current workaround

Proposed solution

Implementation scope

Priority

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: custom_providers should support per-provider max_tokens override #28782

Description

Problem

Current workaround

Proposed solution

Implementation scope

Priority

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions