🐛 Bug
Ragged paged attention test fail in v5e due to vmem exceed.
To Reproduce
On v5e host, run
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
pip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev-cp310-cp310-linux_x86_64.whl' -f https://storage.googleapis.com/libtpu-releases/index.html -f https://storage.googleapis.com/libtpu-wheels/index.html
git clone https://github.com/pytorch/xla.git
cd xla/
python test/test_ragged_paged_attention_kernel.py
🐛 Bug
Ragged paged attention test fail in v5e due to vmem exceed.
To Reproduce
On v5e host, run