Update FA to latest upstream by Fridge003 · Pull Request #28 · sgl-project/sgl-flash-attn

Fridge003 · 2026-01-10T09:05:48Z

No description provided.

Previous we signal per warp group, but that makes the code more complicated for a tiny bit of perf gain.

…ILab#2098)

* improved block sparsity computation * refactor blocksparsity computation for tvm-ffi * refactor mask mod definitions and tests * refactor of block sparsity and mask mod application; eventually allow varlen * remove fastdivmods from compute block sparsity * remove unnecessary imports * revert to 1-phase block sparsity computation * update bwd kernels to use new AttentionMaskCls api * fix linter error

…b#2142)

* use q_stage=1 for split kv * determine q_stage via seqlen_q for sm100 * repurpose softmax1 warps for cp.async load * address comments

* [Cute] Add missing COMPUTE_CAPABILITY definition in test_score_mod.py The paged KV cache tests (test_score_mod_with_paged_kvcache and test_score_mod_with_paged_kvcache_aux_tensors) check COMPUTE_CAPABILITY to skip tests on SM90 since paged KV cache is only supported on SM100. However, the variable was never defined, causing a NameError. This adds the same definition used in test_mask_mod.py: COMPUTE_CAPABILITY = torch.cuda.get_device_capability()[0] * [Cute] Fix missing seqlen_info parameter in mask_mod call The mask_mod call in apply_mask_sm100_transposed was missing the seqlen_info parameter. All mask functions expect the signature: (batch, head, m_idx, n_idx, seqlen_info, aux_tensors) The other two mask_mod calls in the same file correctly pass all 6 arguments, but this one only passed 5, causing: TypeError: cute_ima_mask() missing 1 required positional argument: 'aux_tensors' This fixes test_mask_mod.py::test_mask_mod_ima_partial_block.

* varlen bwd with rounded padded offsets * fix mha * change offset mode to round down multiple * enable varlen bwd tests * enable deterministic mode * fix deadlock and switch mha to no postprocess * reenable tests * fix lint error * use head swizzle/spt for deterministic, update tests * change padding offset based on arch * rebase and update interface, tests * add arch dispatch for padded offset q to postprocess * address comments * remove tile sizes from seqlen info class vars

…Dao-AILab#2146)

This reverts commit 3f01129.

tridao and others added 18 commits December 31, 2025 18:15

[Cute] Add quack as dependency

4fd123e

[Cute,Fwd,Sm90] Change PipelineTMAAsync sublass to signal per warp

f3423a8

Previous we signal per warp group, but that makes the code more complicated for a tiny bit of perf gain.

Add pack-gqa support for blcoksparse impl w/ braodcasted H dim (Dao-A…

9b6dbac

…ILab#2098)

[Cute] Fix minor lint issue in shuffle_sync

bb2efb3

Misc tests that should be xfailed for now (Dao-AILab#2127)

f472175

Update cutlass to fix undefined symbol: cuDriverGetVersion. (Dao-AILa…

3e87e42

…b#2142)

[Cute,Fwd,Sm100] Support q_stage=1 for inference (Dao-AILab#1993)

3c8ca4e

* use q_stage=1 for split kv * determine q_stage via seqlen_q for sm100 * repurpose softmax1 warps for cp.async load * address comments

block-sparse backward SM90 (Dao-AILab#2136)

27a3b54

score-mod backward SM90 (Dao-AILab#2137)

844b10f

[Cute] Clarify and fix subtle cachekey bug (Dao-AILab#2143)

e317aa4

[CUTE][SM100] Fix backward gqa on sm100 post mask-mod semantic change (…

26d4ee9

…Dao-AILab#2146)

[CUTE][SM90]Enable pack-gqa with broadcasted maskmods (Dao-AILab#2145)

8eff546

[CUTE][SM90] GQA backward non deterministic (Dao-AILab#2158)

5d4c953

Revert "Merge pull request #27 from Fridge003/revert"

b742af3

This reverts commit 3f01129.

Merge remote-tracking branch 'origin/main' into sgl-kernel

9721053

Fridge003 mentioned this pull request Jan 10, 2026

[NVIDIA] upstream FA4 sgl-project/sglang#15182

Merged

Qiaolin-Yu approved these changes Jan 10, 2026

View reviewed changes

Fridge003 merged commit f866ec3 into sgl-kernel Jan 10, 2026
3 of 4 checks passed

Fridge003 deleted the upstream branch January 10, 2026 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update FA to latest upstream#28

Update FA to latest upstream#28
Fridge003 merged 18 commits intosgl-kernelfrom
upstream

Fridge003 commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Conversation

Fridge003 commented Jan 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants