Skip to content

Enable native ModelOpt quantization support (3/3)#10154

Merged
merrymercy merged 54 commits intosgl-project:mainfrom
Edwardf0t1:zhiyu/modelopt-sglang-api-3
Oct 22, 2025
Merged

Enable native ModelOpt quantization support (3/3)#10154
merrymercy merged 54 commits intosgl-project:mainfrom
Edwardf0t1:zhiyu/modelopt-sglang-api-3

Conversation

@Edwardf0t1
Copy link
Copy Markdown
Collaborator

@Edwardf0t1 Edwardf0t1 commented Sep 8, 2025

This is the third PR in a three-part series to enable native ModelOpt quantization in SGLang. It includes changes from the first PR (#7149) and second PR (#9991) and will be rebased once the first two PRs are merged.

Motivation

We aim to enhance SGLang's quantization capabilities, making ModelOpt integration more robust and user-friendly while providing checkpoint persistence for better performance in production environments.

Modifications

  • Integrated modelopt quantized model export functionalities.
  • Added modelopt_export_path parameter to _setup_modelopt_quantization() in ModelOptModelLoader.
  • Implemented _export_modelopt_checkpoint() method using modelopt's unified hf export API.
  • Added modelopt_export_path parameter in ModelConfig and added --modelopt-export-path command-line argument in ServerArgs.
  • Export happens automatically after quantization (or when restoring from checkpoint).
  • Added unit tests for the export functionalities.
  • Unified quantization flags in quantize + export and deployment phases.
  • Added an example script to run modelopt quantize + export + deployment.
  • TODO: Enable a quantize-and-serve mode for quantize + export + deployment with a single command.

Accuracy Tests

Production Workflow:

# Step 1: Quantize + Export
python examples/usage/modelopt_quantize_and_export.py quantize \
    --model-path TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
    --export-dir ./quantized_tinyllama_fp8 \
    --quantization-method modelopt_fp8

# Step 2: Deploy
python -m sglang.launch_server \
    --model-path ./quantized_tinyllama_fp8 \
    --quantization modelopt

Benchmarking and Profiling

Checklist

Summary by CodeRabbit

  • New Features

    • Added NVIDIA ModelOpt quantization support (FP8/FP4 auto-detection), export to Hugging Face format, and serving of exported models.
    • Introduced CLI options to export after quantization and to quantize-and-serve.
    • Added quantization choice: modelopt_fp8.
    • Included an example script demonstrating quantize, export, and deploy.
  • Documentation

    • New guide “Using NVIDIA ModelOpt” covering installation, workflow, Python usage, deployment, and advanced features; reference updated.
  • Tests

    • Expanded coverage for ModelOpt workflows and additional model/attention components.
  • Chores

    • Added optional dependency group for ModelOpt.

@Edwardf0t1
Copy link
Copy Markdown
Collaborator Author

@zhyncs @Qiaolin-Yu Please help or find someone review this PR as well when you get a chance. Thank you!

@Qiaolin-Yu Qiaolin-Yu self-assigned this Sep 13, 2025
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-3 branch from 19fcedb to 95fc54b Compare September 13, 2025 01:48
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-3 branch from 95fc54b to d25e5d1 Compare September 23, 2025 08:18
Comment thread test/srt/test_modelopt_loader.py
Comment thread python/sglang/srt/configs/model_config.py
Comment thread examples/usage/modelopt_quantize_and_export.py
Comment thread python/sglang/srt/model_loader/loader.py Outdated
Comment thread python/sglang/srt/model_loader/loader.py Outdated
Comment thread python/pyproject.toml
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-3 branch 2 times, most recently from c5181b3 to 15dd13e Compare September 30, 2025 05:34
@b8zhong b8zhong added the run-ci label Oct 6, 2025
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-3 branch from 15dd13e to 9c2eaac Compare October 8, 2025 08:06
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 force-pushed the zhiyu/modelopt-sglang-api-3 branch from 9b8dc42 to 3cafa90 Compare October 18, 2025 09:18
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Comment thread python/sglang/srt/configs/model_config.py Outdated
…tionality, add ModelOpt fields to for checkpoint and export paths

Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
@Edwardf0t1 Edwardf0t1 enabled auto-merge (squash) October 21, 2025 03:56
@FlamingoPg
Copy link
Copy Markdown
Collaborator

Looks good

@merrymercy merrymercy merged commit 80b2b32 into sgl-project:main Oct 22, 2025
69 of 72 checks passed
xjpang pushed a commit to xjpang/sglang that referenced this pull request Oct 22, 2025
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Kangyan-Zhou added a commit to Kangyan-Zhou/sglang that referenced this pull request Apr 20, 2026
`modelopt_quant` and `modelopt_export_path` were removed from
ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization`
flag and LoadConfig.modelopt_export_path), but the test was never
updated. It stayed latent because the class is skipped when
nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI
image yesterday, which exposed the failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Kangyan-Zhou added a commit to Kangyan-Zhou/sglang that referenced this pull request Apr 20, 2026
`modelopt_quant` and `modelopt_export_path` were removed from
ModelConfig.__init__ in sgl-project#10154 (replaced by unified `quantization`
flag and LoadConfig.modelopt_export_path), but the test was never
updated. It stayed latent because the class is skipped when
nvidia-modelopt isn't installed; sgl-project#23119 added the dep to the CI
image, which exposed the failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants