Skip to content

Unsafe context length assumption #132

@lee-b

Description

@lee-b

The code seems to default to assuming 128k context:

    return metadata[model].get("context_length", 128000)

This leads to complete failures, such as:

⚠️ Error code: 400 - {'error': {'code': 400, 'message': 'request (67374 tokens) exceeds the available context size (65536 tokens), try increasing it', 'type': 'exceed_context_size_error', 'n_prompt_tokens': 67374, 'n_ctx': 65536}}

Which cannot (and should not attempt to) be recovered even by reducing the compaction threshold and restarting the gateway, because the context is already exceeding the amount that the model can process and compact.

128k is a typical model maximum (from a year or so ago). It shouldn't be assumed as a default length. In absolute terms, the safer default would be something like 4096, but, more reasonably, the README.md should probably specify a minimum supported context length, based on the size of prompts sent to the model from internal prompt templates. Then that should be as the default value used, too, because it's the minimum supported and the minimum assumed, exactly per the docs.

Fundamentally, the problem is that the codebase tries to guess this using heuristics and lookup tables for a few well-known models, instead of requiring the information when the model is not recognised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions