Extend paged attention to support query_len>1 by vanbasten23 · Pull Request #8328 · pytorch/xla

vanbasten23 · 2024-10-27T22:20:04Z

This PR extends the existing paged attention kernel to support query_len>1. Additionally, it upgrades the flash attention from v1 to v2.

Test plan:

python pytorch/xla/test/test_pallas.py -v -k PallasTest.test_paged_attention_multi_queries_wrapper
python pytorch/xla/test/test_tpu_paged_attention_kernel.py 2>&1 | tee out.txt

cc: @miladm

vanbasten23 · 2024-10-28T17:42:51Z

+    page_indices,  # [batch_size, pages_per_sequence]
+    num_kv_pages_per_compute_block,
+    num_queries_per_compute_block,
+    use_kernel=True,


hey @WoosukKwon, this is the integration point between vLLM and torch_xla. I'm thinking if vLLM can switch this flag use_kernel perhaps by using some flags. I want to use the nonkernel version as a per baseline. Do you know if it possible?

For dynamo, it's similar. The integration point is at def multi_queries_paged_attention_xla( in the same file.

Liyang90 · 2024-10-28T18:19:04Z

+  q_index = q_blk_idx * num_queries_per_compute_block
+  kv_index = kv_blk_idx * kv_seq_len_per_kv_compute_blk
+  kv_len = lengths_ref[b]
+  row_ids = (kv_len - query_len) + q_index + jax.lax.broadcasted_iota(


Here, we assume the input query corresponds to the last (q_len) of the input kv. For example, if the input q_len is 8, and kv_len is 24, we assume the query corresponds to the kv at index [16. 24), and applies the causal mask accordingly.

@WoosukKwon please let us know if this assumption is valid or nor for the use cases in vLLM.

Yes that's the desired behavior. Thanks for checking it out with me!

… CI" This reverts commit 99d39b4.

vanbasten23 added 5 commits October 26, 2024 13:55

added the kernel and the test.

aec0d1f

all kernel tests pass.

c0082f7

integrate the new kernel to torch_xla

f41a732

add test to the tpu ci

5d2f9df

run linter

34af58e

vanbasten23 added the tpuci label Oct 27, 2024

vanbasten23 requested review from Liyang90 and miladm October 28, 2024 17:25

vanbasten23 marked this pull request as ready for review October 28, 2024 17:34

vanbasten23 commented Oct 28, 2024

View reviewed changes

Liyang90 reviewed Oct 28, 2024

View reviewed changes

vanbasten23 added 4 commits October 28, 2024 20:17

trigger the tpu ci

ee4ab99

added todo

551def2

add __init__.py to the pallas_kernel dir

d00a2c7

fix comments

e8e3395

vanbasten23 commented Oct 28, 2024

View reviewed changes

Comment thread torch_xla/experimental/custom_kernel.py

vanbasten23 added 5 commits October 28, 2024 23:14

debug the new kernel test succeeded locally but failed in the CI

99d39b4

Revert "debug the new kernel test succeeded locally but failed in the…

10fdc33

… CI" This reverts commit 99d39b4.

fix unknown flag -v

52c0ab0

add torch.compile support.

f846f71

linter

f0ca56d

vanbasten23 requested review from Liyang90 and WoosukKwon October 29, 2024 18:22

handle the case where query_len%num_queries_per_compute_block!=0

55cc66a

Liyang90 approved these changes Oct 30, 2024

View reviewed changes

vanbasten23 merged commit 1bac062 into master Oct 31, 2024

miladm added the pallas label Nov 22, 2024

miladm assigned vanbasten23 Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend paged attention to support query_len>1#8328

Extend paged attention to support query_len>1#8328
vanbasten23 merged 15 commits intomasterfrom
xiowei/extend_paged_attention_cleanedup

vanbasten23 commented Oct 27, 2024 •

edited

Loading

Uh oh!

vanbasten23 Oct 28, 2024

Uh oh!

vanbasten23 Oct 29, 2024

Uh oh!

Uh oh!

Uh oh!

Liyang90 Oct 28, 2024

Uh oh!

WoosukKwon Oct 30, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vanbasten23 commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanbasten23 Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Liyang90 Oct 28, 2024

Choose a reason for hiding this comment

Uh oh!

WoosukKwon Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vanbasten23 commented Oct 27, 2024 •

edited

Loading