Support output signals for overlapping for cutedsl gemm by fzyzcjy · Pull Request #1677 · flashinfer-ai/flashinfer

fzyzcjy · 2025-09-14T10:36:23Z

📌 Description

sglang main contains prerequisite code now, thus I can extract related parts to sglang/flashinfer/deepep, as is shown here

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

This reverts commit d35c8f7.

This reverts commit 501fdab.

This reverts commit b872c4a.

# Conflicts: # flashinfer/cute_dsl/blockscaled_gemm.py

gemini-code-assist

Summary of Changes

Hello @fzyzcjy, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the cutedsl gemm implementation by introducing a robust output signaling mechanism. This feature allows the kernel to communicate the completion status of individual batched operations (referred to as "experts") through a dedicated dst_signals array. By integrating new low-level DSL operations for atomic updates and byte manipulation, and carefully adjusting the kernel's synchronization logic, the change provides a fine-grained way to manage dependencies in overlapping computations. This is particularly beneficial for optimizing performance in scenarios requiring asynchronous processing of multiple tasks, such as in Mixture-of-Experts architectures.

Highlights

Atomic Signal Mechanism: Implemented a new mechanism to signal completion of work for individual batches (experts) using an optional cute.Pointer to dst_signals, enabling fine-grained synchronization.
New DSL Operations: Introduced with_byte, read_byte for Uint64 manipulation and atomic_add_release_global for atomic increments in global memory, leveraging LLVM inline assembly.
Scheduler Integration: Modified MaskedSchedulerParams and MaskedScheduler to accept and manage dst_signals, updating dsm_pending_packed and dsm_counter to track signal states for each batch.
Kernel Synchronization Logic: Adjusted the kernel's c_pipeline.producer_acquire() and producer_tail() calls to conditionally wait for writes (read=False) when dst_signals are enabled, ensuring proper synchronization before signaling.
Python API Extension: Extended the grouped_gemm_nt_masked function and related internal functions to accept an optional dst_signals tensor, allowing users to enable and utilize this signaling feature.
Comprehensive Testing: Added new test cases to test_cute_dsl_blockscaled_gemm.py to verify the correctness of the dst_signals functionality, including assertions on the final signal values.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for output signals in the cutedsl GEMM kernel, aimed at enabling better overlapping of computations. The changes are comprehensive, affecting the scheduler, the kernel implementation, and the public-facing API. New helper functions for byte manipulation and atomic operations are added. The core logic modification resides in the kernel's epilogue, which now includes conditional synchronization and atomic signaling. The associated tests have been updated to validate this new functionality. My review focuses on enhancing the maintainability of the complex new kernel logic by suggesting refactoring to reduce code duplication and simplify conditional structures. Overall, the changes appear logically sound and the new feature is well-tested.

fzyzcjy added 30 commits August 25, 2025 18:05

more

5163ae0

more

7ed7467

more

f92b44c

more

5f6bf6f

more

772b380

more

ee5d37e

more

1a0c75c

wait signals

b058ef9

expect value

e29a7a5

pass args

99a464a

args

e909d04

src signals

afca11c

dst signal

1bb916a

type

a8e6de2

more args

ebe1160

more args

47fa767

types

7912f51

more

125e19e

more

882e120

more

28ca057

more

c704556

more

732f21e

more

8f6215c

more

d26dee8

more

5667aa8

extract

af69797

lane_id

7d2130f

more

80d611b

more

ca238ae

more

f2e80d0

fzyzcjy added 23 commits August 27, 2025 16:14

more

06e3d7b

more

95be221

check tile

d35c8f7

Revert "check tile"

0ffbe3e

This reverts commit d35c8f7.

more

02a4159

more

8f051d3

more

a51fec7

more

75c998a

more

779c1cf

more

44576ff

rm timeout check

f97abca

more

14e06a7

wait signal once per warp

501fdab

disable exit after first wait

75545c0

Revert "wait signal once per warp"

c17c34e

This reverts commit 501fdab.

hack: allow 500 missing tokens

b872c4a

Revert "hack: allow 500 missing tokens"

500bb86

This reverts commit b872c4a.

Merge branch 'main-upstream' into feat/expert_based_overlap

ea1a9b8

# Conflicts: # flashinfer/cute_dsl/blockscaled_gemm.py

rm src_signals

d161419

simp code

d1d0157

fmt

9cca919

tests

e4769ad

ci

c6c06b5

gemini-code-assist Bot reviewed Sep 14, 2025

View reviewed changes

fzyzcjy mentioned this pull request Sep 14, 2025

Support input and output signals for cutedsl gemm #1569

Draft

5 tasks

gemini-code-assist Bot reviewed Sep 14, 2025

View reviewed changes

Comment thread flashinfer/cute_dsl/blockscaled_gemm.py

Comment thread flashinfer/cute_dsl/blockscaled_gemm.py

ci

8b23cc3

ch-wan mentioned this pull request Sep 15, 2025

Support single batch overlap sgl-project/sglang#10422

Merged

4 tasks

zhyncs approved these changes Sep 15, 2025

View reviewed changes

zhyncs merged commit 79fe3cd into flashinfer-ai:main Sep 15, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support output signals for overlapping for cutedsl gemm#1677

Support output signals for overlapping for cutedsl gemm#1677
zhyncs merged 90 commits intoflashinfer-ai:mainfrom
fzyzcjy:feat/expert_based_overlap

fzyzcjy commented Sep 14, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fzyzcjy commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fzyzcjy commented Sep 14, 2025 •

edited

Loading