[ragged-paged-attn] Unify kv strided load to one. by bythew3i · Pull Request #8929 · pytorch/xla

bythew3i · 2025-04-03T01:01:01Z

I expected Mosaic can canonicalize 2 same strided load but it did not. So I manually unify kv strided loads to one and this also make the code cleaner.

Perf gain: 20%

with the same 128 kv_page_per_blk and 32 q_per_blk, we can see 20% speedup in both kernel and vllm benchmarking results which shows throughput was boosted from 6.15 to 7.49 reqs/second on TPU v6e-1.

Tested:

python test/test_pallas.py -v -k PallasTest.test_ragged_paged_attention_wrapper

yaochengji

LGTM, thanks!

format

[ragged-paged-attn] Unify kv strided load to one.

bd69fbf

bythew3i force-pushed the ragged-attn-v2 branch from ad66996 to bd69fbf Compare April 3, 2025 01:17

Merge branch 'master' into ragged-attn-v2

9ab9f31

yaochengji approved these changes Apr 3, 2025

View reviewed changes

Update ragged_paged_attention_v2.py

e109142

format

vanbasten23 approved these changes Apr 3, 2025

View reviewed changes

yaochengji merged commit c044c69 into pytorch:master Apr 3, 2025
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ragged-paged-attn] Unify kv strided load to one.#8929

[ragged-paged-attn] Unify kv strided load to one.#8929
yaochengji merged 3 commits intopytorch:masterfrom
bythew3i:ragged-attn-v2

bythew3i commented Apr 3, 2025

Uh oh!

yaochengji left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bythew3i commented Apr 3, 2025

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants