Enable native ModelOpt quantization support (2/3)#9991
Merged
Edwardf0t1 merged 24 commits intosgl-project:mainfrom Oct 11, 2025
Merged
Enable native ModelOpt quantization support (2/3)#9991Edwardf0t1 merged 24 commits intosgl-project:mainfrom
Edwardf0t1 merged 24 commits intosgl-project:mainfrom
Conversation
4 tasks
6 tasks
Collaborator
|
hi @Edwardf0t1 can you help fix the conflicts? thanks |
4 tasks
2674259 to
aed7dd2
Compare
Collaborator
Author
@zhyncs Just rebased and resolved the conflicts. Could you or @Qiaolin-Yu help review the PR? Thanks. |
Contributor
|
I think we should add example code in this PR to demonstrate how to use |
jingyu-ml
reviewed
Sep 15, 2025
f074579 to
c13b457
Compare
Collaborator
Author
The usage is covered in unit tests: |
Qiaolin-Yu
reviewed
Sep 24, 2025
c118561 to
e75fbf3
Compare
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
40fefb3 to
9bc99e7
Compare
lpc0220
pushed a commit
to lpc0220/sglang
that referenced
this pull request
Oct 29, 2025
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the second PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and will be rebased once the first PR is merged.
Motivation
We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.
Modifications
_setup_modelopt_quantization()and added calibration functionalities.modelopt_checkpoint_restore_pathandmodelopt_checkpoint_save_pathparameters to bothModelConfigandServerArgs. These allow users to save and restore quantized checkpoints, avoiding re-quantization on subsequent runstest_modelopt_loader.pyto verify the ModelOpt functionality.The 3rd PR are also ready for review: #10154
Accuracy Tests
Benchmarking and Profiling
Checklist