Unsafe context length assumption

The code seems to default to assuming 128k context:

>         return metadata[model].get("context_length", 128000)

This leads to complete failures, such as:

⚠️ Error code: 400 - {'error': {'code': 400, 'message': 'request (67374 tokens) exceeds the available context size (65536 tokens), try increasing it', 'type': 'exceed_context_size_error', 'n_prompt_tokens': 67374, 'n_ctx': 65536}}

Which cannot (and should not attempt to) be recovered even by reducing the compaction threshold and restarting the gateway, because the context is already exceeding the amount that the model can process and compact.

128k is a typical model _**maximum**_ (from a year or so ago). It shouldn't be assumed as a default length. In absolute terms, the safer default would be something like 4096, **_but_**, more reasonably, the README.md should probably specify a minimum supported context length, based on the size of prompts sent to the model from internal prompt templates. Then that should be as the default value used, too, because it's the minimum supported and the minimum assumed, exactly per the docs.

Fundamentally, the problem is that the codebase tries to guess this using heuristics and lookup tables for a few well-known models, instead of requiring the information when the model is not recognised.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unsafe context length assumption #132

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Unsafe context length assumption #132

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions