Skip to content

initialise model with max_model_len #159

@nivibilla

Description

@nivibilla

Similar to how vllm has the 'max_model_length' when starting the server. Can we have this here too?

This would help when trying to host models on smaller gpus. For example with vllm Mistral 7b with 32k context doesn't fit on a single 24GB GPU. Whereas with 8k context it does.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions