[Bugfix] fix ragged attention kernel auto-tuning table key by yaochengji · Pull Request #9497 · pytorch/xla

yaochengji · 2025-07-22T19:32:36Z

Fix the bug that q_heads, kv_heads and max_model_len are accidentally padded to power of 2.

Without fix, models like QWen2.5-7B (q_heads=28, kv_heads=4) cannot hit the attention table.

yaochengji · 2025-07-22T19:45:19Z

@bythew3i could you take a look?

bythew3i

Thanks Chengji.

It is intended to pad. I think the bug is in our autotuning instead of here, we should pad the num_heads before running autotune to make the table smaller.

To unblock the model, I think this PR LGTM but we really need to fix the autotune script.

yaochengji · 2025-07-22T21:46:05Z

Thanks Chengji.

It is intended to pad. I think the bug is in our autotuning instead of here, we should pad the num_heads before running autotune to make the table smaller.

To unblock the model, I think this PR LGTM but we really need to fix the autotune script.

Thanks @bythew3i , will change it back when auto-tuning table is corrected.

vanbasten23

Thanks Chengji!

[Bugfix] fix ragged attention kernel auto-tuning table key

3b99bc8

yaochengji force-pushed the chengji/fix-attn-table branch from 53077e8 to 3b99bc8 Compare July 22, 2025 19:44

fix yapf

715a392

yaochengji requested a review from vanbasten23 July 22, 2025 19:45

bythew3i approved these changes Jul 22, 2025

View reviewed changes

yaochengji enabled auto-merge (squash) July 22, 2025 21:46

vanbasten23 reviewed Jul 22, 2025

View reviewed changes

Comment thread torch_xla/experimental/pallas_kernels/ragged_paged_attention_v2.py Outdated

vanbasten23 reviewed Jul 22, 2025

View reviewed changes

Comment thread torch_xla/experimental/pallas_kernels/ragged_paged_attention_v2.py

vanbasten23 approved these changes Jul 22, 2025

View reviewed changes

yaochengji added 3 commits July 22, 2025 23:14

address comments

50334a6

address comments

7a15ecf

address comments

0bbf02d

yaochengji force-pushed the chengji/fix-attn-table branch from 5ef3bc6 to 0bbf02d Compare July 23, 2025 05:26

yaochengji added 2 commits July 23, 2025 18:11

remove safe_zip

5f3fa3c

remove safe_zip

6ea1654

yaochengji merged commit 31c4c2f into master Jul 23, 2025
23 of 24 checks passed

yaochengji mentioned this pull request Jul 24, 2025

[TPU] Update ptxla nightly version to 20250724 vllm-project/vllm#21555

Merged

bhavya01 mentioned this pull request Jul 31, 2025

2.8 backport PR request list #9433

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] fix ragged attention kernel auto-tuning table key#9497

[Bugfix] fix ragged attention kernel auto-tuning table key#9497
yaochengji merged 7 commits intomasterfrom
chengji/fix-attn-table

yaochengji commented Jul 22, 2025 •

edited

Loading

Uh oh!

yaochengji commented Jul 22, 2025

Uh oh!

bythew3i left a comment

Uh oh!

yaochengji commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

vanbasten23 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yaochengji commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaochengji commented Jul 22, 2025

Uh oh!

bythew3i left a comment

Choose a reason for hiding this comment

Uh oh!

yaochengji commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

vanbasten23 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yaochengji commented Jul 22, 2025 •

edited

Loading