Skip to content

[ragged-paged-attn] Unify kv strided load to one.#8929

Merged
yaochengji merged 3 commits intopytorch:masterfrom
bythew3i:ragged-attn-v2
Apr 3, 2025
Merged

[ragged-paged-attn] Unify kv strided load to one.#8929
yaochengji merged 3 commits intopytorch:masterfrom
bythew3i:ragged-attn-v2

Conversation

@bythew3i
Copy link
Copy Markdown
Contributor

@bythew3i bythew3i commented Apr 3, 2025

I expected Mosaic can canonicalize 2 same strided load but it did not. So I manually unify kv strided loads to one and this also make the code cleaner.

Perf gain: 20%

  • with the same 128 kv_page_per_blk and 32 q_per_blk, we can see 20% speedup in both kernel and vllm benchmarking results which shows throughput was boosted from 6.15 to 7.49 reqs/second on TPU v6e-1.

Tested:

python test/test_pallas.py -v -k PallasTest.test_ragged_paged_attention_wrapper

Copy link
Copy Markdown
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@yaochengji yaochengji merged commit c044c69 into pytorch:master Apr 3, 2025
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants