Fix chunked prefix cache for nvfp4#10180
Merged
zhyncs merged 19 commits intosgl-project:mainfrom Sep 12, 2025
Merged
Conversation
2de5c0a to
d944356
Compare
Fridge003
reviewed
Sep 9, 2025
wenscarl
commented
Sep 9, 2025
Collaborator
Author
wenscarl
left a comment
There was a problem hiding this comment.
Work around cutlass kernel for chunked prefix
Revert "[NVIDIA] disable chunked prefix cache when dp and blackwell is used (sgl-project#9861)" This reverts commit 90dfe3d.
80c7b3e to
5074cbd
Compare
zhyncs
reviewed
Sep 9, 2025
Collaborator
|
@wenscarl Will the accuracy of dpsk-r1 nvfp4 be back to normal after changing flashinfer kernel to fa2 version? |
Collaborator
Author
|
4 tasks
yzh119
reviewed
Sep 10, 2025
Collaborator
|
can we route the logic based on quantization temporarily ? FP8 model works well with cutlass backend, and using FA2 the perf would drop to 1/2. |
704817d to
694443d
Compare
Fridge003
reviewed
Sep 10, 2025
e1c1ee3 to
959d4a8
Compare
elfiegg
reviewed
Sep 10, 2025
Collaborator
elfiegg
left a comment
There was a problem hiding this comment.
Thanks for the quick fix!
elfiegg
reviewed
Sep 10, 2025
elfiegg
reviewed
Sep 10, 2025
Collaborator
|
This PR already includes the changes from #10178 |
Fridge003
reviewed
Sep 11, 2025
Collaborator
|
@wenscarl Please fix lint with |
Collaborator
|
looking forward to this |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
co-authored by @elfiegg
Motivation
To address issue 9806
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist