feat(config): support per-model max_tokens in custom_providers config

## Problem

When using models with large output limits (e.g., DeepSeek V4 with 384K max_output_tokens), there is currently no way to configure `max_tokens` per-provider or per-model in `config.yaml`. 

The `custom_providers` section already supports `context_length` per-model, but `max_tokens` is not read from config at all. The `AIAgent.__init__` only accepts `max_tokens` as a direct parameter, and neither `cli.py` nor `gateway/run.py` pass any config-based `max_tokens` value through.

This forces users who want to take advantage of large output limits (DeepSeek V4 384K, Gemini 2M, etc.) to either:
1. Accept the API server's default (which may be far lower than the model's capability)
2. Hardcode a global `max_tokens` that doesn't work across providers

## Technical Details

Resolution chain currently works for `context_length`:
- `config.yaml` -> `model.<provider_name>.models.<model_name>.context_length`
- `run_agent.py` reads from `custom_providers` config in `_config_context_length()`

But no equivalent exists for `max_tokens`:
- `run_agent.py:804` -- `self.max_tokens` only comes from `__init__` parameter
- `run_agent.py:1366-1431` -- custom_providers loop only reads `context_length`, not `max_tokens`
- `run_agent.py:6644-6645` -- API call only sends `max_tokens` if `self.max_tokens is not None`
- `cli.py:2795-2922` -- no max_tokens passed to AIAgent
- `gateway/run.py:960` -- same
- Only `batch_runner.py:329` has `config.get("max_tokens")`, but reads root-level config, not model-level

## Proposed Solution

Add `max_tokens` support to the `custom_providers` model config, similar to how `context_length` works:

```yaml
custom_providers:
  deepseek-v4:
    base_url: https://api.deepseek.com/v1
    models:
      deepseek-chat:
        context_length: 1000000
        max_tokens: 384000
```

And/or add a top-level `default_max_tokens` config key for models without explicit config.

## Additional Context

- DeepSeek V4 (released 2026-04-24) supports up to 384K output tokens
- Related issue: #9489 (retry-based max_tokens boost, different scope)
- This is purely a config-reader enhancement -- no API behavior changes needed


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(config): support per-model max_tokens in custom_providers config #15037

Problem

Technical Details

Proposed Solution

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(config): support per-model max_tokens in custom_providers config #15037

Description

Problem

Technical Details

Proposed Solution

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions