Feature Description
Add an explicit auxiliary.compression.context_length config option.
Example:
auxiliary:
compression:
provider: custom
model: gpt-5.4
base_url: https://example-proxy/v1
context_length: 500000
Motivation
Today, compression runtime context length is inferred indirectly:
- from
model.context_length
- or from
custom_providers[].models[*].context_length
- or from provider metadata probing
- or from fallback defaults
That works, but it is awkward when the compression model is treated as a first-class runtime of its own.
This is especially useful for custom providers where /models may not expose a context_length field, or may expose model IDs that do not exactly match the configured runtime.
The recent bug fix in #8786 makes override handling more consistent, but an explicit field would still improve clarity and reduce ambiguity.
Proposed Solution
Support a direct override for the compression auxiliary runtime:
auxiliary:
compression:
context_length: 500000
Suggested precedence:
auxiliary.compression.context_length
- explicit model/runtime override (
model.context_length or matching custom_providers[].models[*].context_length)
- provider metadata probing
- default fallback
Benefits
- makes compression runtime configuration explicit
- avoids relying on provider metadata for custom endpoints
- improves discoverability for users configuring large-context compression models
- decouples compression runtime context from the main runtime when they differ
Related
Feature Description
Add an explicit
auxiliary.compression.context_lengthconfig option.Example:
Motivation
Today, compression runtime context length is inferred indirectly:
model.context_lengthcustom_providers[].models[*].context_lengthThat works, but it is awkward when the compression model is treated as a first-class runtime of its own.
This is especially useful for custom providers where
/modelsmay not expose acontext_lengthfield, or may expose model IDs that do not exactly match the configured runtime.The recent bug fix in #8786 makes override handling more consistent, but an explicit field would still improve clarity and reduce ambiguity.
Proposed Solution
Support a direct override for the compression auxiliary runtime:
Suggested precedence:
auxiliary.compression.context_lengthmodel.context_lengthor matchingcustom_providers[].models[*].context_length)Benefits
Related