Enable native ModelOpt quantization support (2/3) by Edwardf0t1 · Pull Request #9991 · sgl-project/sglang

Edwardf0t1 · 2025-09-04T00:52:21Z

This is the second PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and will be rebased once the first PR is merged.

Motivation

We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.

Modifications

Created _setup_modelopt_quantization() and added calibration functionalities.
Added modelopt_checkpoint_restore_path and modelopt_checkpoint_save_path parameters to both ModelConfig and ServerArgs. These allow users to save and restore quantized checkpoints, avoiding re-quantization on subsequent runs
Improved error handling during the ModelOpt quantization process.
Added mode unit tests in test_modelopt_loader.py to verify the ModelOpt functionality.

The 3rd PR are also ready for review: #10154

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

zhyncs · 2025-09-08T04:41:17Z

hi @Edwardf0t1 can you help fix the conflicts? thanks

Edwardf0t1 · 2025-09-12T23:36:34Z

hi @Edwardf0t1 can you help fix the conflicts? thanks

@zhyncs Just rebased and resolved the conflicts. Could you or @Qiaolin-Yu help review the PR? Thanks.

jingyu-ml · 2025-09-15T23:43:18Z

I think we should add example code in this PR to demonstrate how to use modelopt_checkpoint_restore_path or other new functions, so users can understand without needing deep context.

Edwardf0t1 · 2025-09-18T07:20:09Z

I think we should add example code in this PR to demonstrate how to use modelopt_checkpoint_restore_path or other new functions, so users can understand without needing deep context.

The usage is covered in unit tests: test/srt/model_loader/test_modelopt_loader.py

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 mentioned this pull request Sep 4, 2025

Enable native ModelOpt quantization support (2/3) Edwardf0t1/sglang#1

Open

4 tasks

Edwardf0t1 self-assigned this Sep 6, 2025

Edwardf0t1 marked this pull request as ready for review September 6, 2025 00:00

Edwardf0t1 requested review from BBuf, HaiShaw, Ying1123, ch-wan, hnyls2002, ispobock, kushanam, merrymercy and zhyncs as code owners September 6, 2025 00:00

Edwardf0t1 mentioned this pull request Sep 6, 2025

Enable native ModelOpt quantization support (1/3) #7149

Merged

6 tasks

zhyncs assigned Qiaolin-Yu Sep 6, 2025

zhyncs added the high priority label Sep 6, 2025

Edwardf0t1 mentioned this pull request Sep 8, 2025

Enable native ModelOpt quantization support (3/3) #10154

Merged

4 tasks

Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-2 branch from 2674259 to aed7dd2 Compare September 12, 2025 23:32

Edwardf0t1 requested review from CatherineSue and slin1237 as code owners September 12, 2025 23:32

jingyu-ml reviewed Sep 15, 2025

View reviewed changes

Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-2 branch from f074579 to c13b457 Compare September 18, 2025 06:21

Edwardf0t1 added the run-ci label Sep 18, 2025

Qiaolin-Yu reviewed Sep 24, 2025

View reviewed changes

Comment thread python/sglang/srt/server_args.py Outdated

Comment thread test/srt/model_loader/test_modelopt_loader.py Outdated

Comment thread test/srt/model_loader/test_modelopt_loader.py Outdated

Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-2 branch 2 times, most recently from c118561 to e75fbf3 Compare September 29, 2025 23:56

Edwardf0t1 requested a review from JustinTong0323 as a code owner September 29, 2025 23:56

Edwardf0t1 added 24 commits October 11, 2025 03:13

resolve conflict

550b869

resolve conflict

cfd2ba1

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

784d305

resolve conflict

3ea7d0a

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

65e1ab2

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

8608a6e

add a unit test for ModelOptModelLoader

14f0ec7

resolve conflict

d8dbcd6

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

relocate unit test and add to run_suite

e5990bd

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

7aaffb9

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

d2cce43

resolve conflict

d2297e5

resolve conflict

6833853

cleanup

b8a99d7

resolve conflict

a835d6c

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

add TODOs to make batch size and calibration samples configurable

ad26b01

improve logging with rank0_log

288fdc9

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

improve logging

62edce3

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

minor

0c8c302

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

52c244b

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

resolve conflict

4e5441b

fix ci test

48cdee3

fix ci

e1a66ab

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

minor

9bc99e7

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-2 branch from 40fefb3 to 9bc99e7 Compare October 11, 2025 03:13

Edwardf0t1 enabled auto-merge (squash) October 11, 2025 03:14

Edwardf0t1 merged commit 129d299 into sgl-project:main Oct 11, 2025
84 of 98 checks passed

lpc0220 pushed a commit to lpc0220/sglang that referenced this pull request Oct 29, 2025

Enable native ModelOpt quantization support (2/3) (sgl-project#9991)

a2b43b9

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>

coderabbitai Bot mentioned this pull request Mar 29, 2026

Add Agent Deployment skill for model serving NVIDIA/Model-Optimizer#1133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable native ModelOpt quantization support (2/3)#9991

Enable native ModelOpt quantization support (2/3)#9991
Edwardf0t1 merged 24 commits intosgl-project:mainfrom
Edwardf0t1:zhiyu/modelopt-sglang-api-2

Edwardf0t1 commented Sep 4, 2025 •

edited

Loading

Uh oh!

zhyncs commented Sep 8, 2025

Uh oh!

Edwardf0t1 commented Sep 12, 2025

Uh oh!

jingyu-ml commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Edwardf0t1 commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

zhyncs commented Sep 8, 2025

Uh oh!

Edwardf0t1 commented Sep 12, 2025

Uh oh!

jingyu-ml commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edwardf0t1 commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Edwardf0t1 commented Sep 4, 2025 •

edited

Loading