Feature request
When using torch.compile, the prefill is recompiled for every new sequence length, which is slow. It may be nice to be able to compile only say for some sequence lengths (1, 2, 4, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, etc) on the fly depending on the input lengths, using some padding.
Motivation
torch.compile compilation is prohibitively slow even with #29114
If people want to use transformers + static cache + torch.compile, it should be FAST to run generate on new sequence lengths.
Your contribution
None for now