Enable native ModelOpt quantization support (3/3)#10154
Merged
merrymercy merged 54 commits intosgl-project:mainfrom Oct 22, 2025
Merged
Enable native ModelOpt quantization support (3/3)#10154merrymercy merged 54 commits intosgl-project:mainfrom
merrymercy merged 54 commits intosgl-project:mainfrom
Conversation
4 tasks
This was referenced Sep 12, 2025
e97069f to
19fcedb
Compare
Collaborator
Author
|
@zhyncs @Qiaolin-Yu Please help or find someone review this PR as well when you get a chance. Thank you! |
19fcedb to
95fc54b
Compare
95fc54b to
d25e5d1
Compare
Qiaolin-Yu
reviewed
Sep 24, 2025
d25e5d1 to
a9e4353
Compare
Edwardf0t1
commented
Sep 26, 2025
c5181b3 to
15dd13e
Compare
15dd13e to
9c2eaac
Compare
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
9b8dc42 to
3cafa90
Compare
merrymercy
approved these changes
Oct 20, 2025
…tionality, add ModelOpt fields to for checkpoint and export paths Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Collaborator
|
Looks good |
xjpang
pushed a commit
to xjpang/sglang
that referenced
this pull request
Oct 22, 2025
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
5 tasks
Kangyan-Zhou
added a commit
to Kangyan-Zhou/sglang
that referenced
this pull request
Apr 20, 2026
`modelopt_quant` and `modelopt_export_path` were removed from ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization` flag and LoadConfig.modelopt_export_path), but the test was never updated. It stayed latent because the class is skipped when nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI image yesterday, which exposed the failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
Kangyan-Zhou
added a commit
to Kangyan-Zhou/sglang
that referenced
this pull request
Apr 20, 2026
`modelopt_quant` and `modelopt_export_path` were removed from ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization` flag and LoadConfig.modelopt_export_path), but the test was never updated. It stayed latent because the class is skipped when nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI image, which exposed the failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the third PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and second PR (#9991) and will be rebased once the first two PRs are merged.
Motivation
We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.
Modifications
modelopt_export_pathparameter to_setup_modelopt_quantization()inModelOptModelLoader._export_modelopt_checkpoint()method using modelopt's unified hf export API.modelopt_export_pathparameter inModelConfigand added--modelopt-export-pathcommand-line argument inServerArgs.quantize-and-servemode for quantize + export + deployment with a single command.Accuracy Tests
Production Workflow:
Benchmarking and Profiling
Checklist
Summary by CodeRabbit
New Features
Documentation
Tests
Chores