Static cache + torch.compile: better documentation for prefill static sequence length 

### Feature request

When using torch.compile, the prefill is recompiled for every new sequence length, which is slow. It may be nice to be able to compile only say for some sequence lengths (`1, 2, 4, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, etc`) on the fly depending on the input lengths, using some padding.

### Motivation

torch.compile compilation is prohibitively slow even with https://github.com/huggingface/transformers/pull/29114

If people want to use transformers + static cache + torch.compile, it should be FAST to run `generate` on new sequence lengths.

### Your contribution

None for now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static cache + torch.compile: better documentation for prefill static sequence length #29151

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Static cache + torch.compile: better documentation for prefill static sequence length #29151

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions