Update FA to latest upstream#28
Merged
Fridge003 merged 18 commits intosgl-kernelfrom Jan 10, 2026
Merged
Conversation
Previous we signal per warp group, but that makes the code more complicated for a tiny bit of perf gain.
* improved block sparsity computation * refactor blocksparsity computation for tvm-ffi * refactor mask mod definitions and tests * refactor of block sparsity and mask mod application; eventually allow varlen * remove fastdivmods from compute block sparsity * remove unnecessary imports * revert to 1-phase block sparsity computation * update bwd kernels to use new AttentionMaskCls api * fix linter error
* use q_stage=1 for split kv * determine q_stage via seqlen_q for sm100 * repurpose softmax1 warps for cp.async load * address comments
* [Cute] Add missing COMPUTE_CAPABILITY definition in test_score_mod.py The paged KV cache tests (test_score_mod_with_paged_kvcache and test_score_mod_with_paged_kvcache_aux_tensors) check COMPUTE_CAPABILITY to skip tests on SM90 since paged KV cache is only supported on SM100. However, the variable was never defined, causing a NameError. This adds the same definition used in test_mask_mod.py: COMPUTE_CAPABILITY = torch.cuda.get_device_capability()[0] * [Cute] Fix missing seqlen_info parameter in mask_mod call The mask_mod call in apply_mask_sm100_transposed was missing the seqlen_info parameter. All mask functions expect the signature: (batch, head, m_idx, n_idx, seqlen_info, aux_tensors) The other two mask_mod calls in the same file correctly pass all 6 arguments, but this one only passed 5, causing: TypeError: cute_ima_mask() missing 1 required positional argument: 'aux_tensors' This fixes test_mask_mod.py::test_mask_mod_ima_partial_block.
* varlen bwd with rounded padded offsets * fix mha * change offset mode to round down multiple * enable varlen bwd tests * enable deterministic mode * fix deadlock and switch mha to no postprocess * reenable tests * fix lint error * use head swizzle/spt for deterministic, update tests * change padding offset based on arch * rebase and update interface, tests * add arch dispatch for padded offset q to postprocess * address comments * remove tile sizes from seqlen info class vars
This reverts commit 3f01129.
Qiaolin-Yu
approved these changes
Jan 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.