Only write to HBM at the last iteration.#8393
Conversation
|
The TPU CI failure seems to be irrelevant to the PR: I run the test |
|
TPU CI failure should be resolved if you rebase, I disabled that test for now |
1c17a71 to
d52c6f2
Compare
Thanks Jack for the info! |
|
The TPU test failure is very strange. On my TPU v4, The failing test succeeded on my v5e VM though which uses an older version of torch and torch_xla: |
|
Seems the error is due to OOM despite the confusing error message even with |
Test plan: root@t1v-n-f3643994-w-0:/workspaces/persist# python pytorch/xla/test/test_tpu_paged_attention_kernel.py 2>&1 | tee ~/out.txt