fix CompressedTensorsW8A8Int8 min_capability by mmdbhs · Pull Request #13914 · sgl-project/sglang

mmdbhs · 2025-11-25T09:07:05Z

Motivation

sm80 gpu supports CompressedTensorsW8A8Int8.

Modifications

set CompressedTensorsW8A8Int8:get_min_capability(): 89 -> 80

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-11-25T09:07:15Z

Summary of Changes

Hello @mmdbhs, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request expands the compatibility of the CompressedTensorsW8A8Int8 quantization method to a wider range of NVIDIA GPUs. By adjusting the minimum required compute capability, the change allows this specific quantization scheme to be utilized on Ampere architecture GPUs (SM80) in addition to the previously supported Lovelace architecture (SM89) and newer. This modification directly addresses the motivation that SM80 GPUs support this feature, making the quantization more accessible.

Highlights

GPU Capability Support: The minimum GPU compute capability required for the CompressedTensorsW8A8Int8 quantization scheme has been lowered from 8.9 (Lovelace) to 8.0 (Ampere), enabling broader hardware support for this optimization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request correctly lowers the minimum compute capability for CompressedTensorsW8A8Int8 from 8.9 (Lovelace) to 8.0 (Ampere), as the underlying CUDA kernel supports the Ampere architecture. This change enables the feature on a broader range of GPUs. My review includes a suggestion to update a code comment that became misleading after this change.

gemini-code-assist · 2025-11-25T09:07:58Z

        # lovelace and up
-        return 89
+        return 80


The comment on line 38 is now misleading. Compute capability 8.0 corresponds to the Ampere architecture, while Lovelace is 8.9 and higher. To avoid confusion, please update the comment to reflect the new minimum requirement.

Suggested change

# lovelace and up

return 89

return 80

# ampere and up

return 80

Fix the comment at same time

Fix the comment at same time

done

FlamingoPg · 2025-11-25T10:15:23Z

Others LGTM

mmdbhs · 2025-11-27T03:19:37Z

@AniZpZ @Edwardf0t1 @BBuf @ch-wan I would appreciate it if you could review this PR. It's a small modification.

mmdbhs · 2025-12-02T03:08:35Z

@FlamingoPg I would appreciate it if you could review this PR. It's a small modification.

…n_eagle3_npu * 'main' of https://github.com/sgl-project/sglang: (89 commits) [model-gateway] Remove legacy RouterMetrics and Rename SmgMetrics to Metrics and smg_labels to metrics_labels (sgl-project#15160) [diffusion] fix: fix video model sp when resolution is not specified (sgl-project#15047) [diffusion] fix: fix pytorch non-writable array warning (sgl-project#15017) [diffusion] fix: cache dit with parallel (sgl-project#15163) chore: change npu pr-test a2 runner (sgl-project#15152) [Feature] Fuse mrope all in 1 kernel (sgl-project#14906) Fix num running requests (load) wrong cleared for ongoing requests (sgl-project#15116) Fused two elementwise kernels for k_nope and k_pe concat (sgl-project#14862) fix: adding date and fixing release name issue (sgl-project#15174) [CPU] Add Gemma3RMSNorm kernel in sgl-kernel and add ut (sgl-project#9324) feature: PR wheel (sgl-project#15170) [diffusion] model: support mutli-image input and qwen-image-edit-2509 (sgl-project#15005) fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914) Tiny improve summary text in `bench_one_batch_server.py` (sgl-project#15158) [model-gateway] add mcp and discovery metrics (sgl-project#15156) fix: move ci-bot (sgl-project#15154) Fix import warnings (sgl-project#15144) ci: adding errors to Github summary (sgl-project#14778) [model-gateway] Add streaming metrics for harmony gRPC router (sgl-project#15147) [model-gateway] upgrade axum and axum server (sgl-project#15146) ... # Conflicts: # python/sglang/srt/server_args.py

Co-authored-by: Fan Yin <1106310035@qq.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com>

fix CompressedTensorsW8A8Int8 min_capability

5d98867

mmdbhs requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners November 25, 2025 09:07

gemini-code-assist Bot reviewed Nov 25, 2025

View reviewed changes

FlamingoPg added the run-ci label Nov 25, 2025

Merge branch 'main' into fix/CompressedTensorsW8A8Int8-min_capability

32865f8

Update compressed_tensors_w8a8_int8.py

96e9ecc

Merge branch 'main' into fix/CompressedTensorsW8A8Int8-min_capability

cbaa9d3

Merge branch 'main' into fix/CompressedTensorsW8A8Int8-min_capability

22b69ef

AniZpZ approved these changes Dec 14, 2025

View reviewed changes

AniZpZ added the ready-to-merge The PR is ready to merge after the CI is green. label Dec 14, 2025

AniZpZ self-assigned this Dec 15, 2025

AniZpZ merged commit 16e6bc2 into sgl-project:main Dec 15, 2025
141 of 150 checks passed

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 17, 2025

fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914)

1d11fea

Co-authored-by: Fan Yin <1106310035@qq.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com>

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

fix CompressedTensorsW8A8Int8 min_capability (sgl-project#13914)

7194a62

Co-authored-by: Fan Yin <1106310035@qq.com> Co-authored-by: Peng Zhang <aniz1905@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix CompressedTensorsW8A8Int8 min_capability#13914

fix CompressedTensorsW8A8Int8 min_capability#13914
AniZpZ merged 5 commits intosgl-project:mainfrom
mmdbhs:fix/CompressedTensorsW8A8Int8-min_capability

mmdbhs commented Nov 25, 2025

Uh oh!

gemini-code-assist Bot commented Nov 25, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Nov 25, 2025

Uh oh!

FlamingoPg Nov 25, 2025

Uh oh!

mmdbhs Nov 25, 2025

Uh oh!

FlamingoPg commented Nov 25, 2025

Uh oh!

mmdbhs commented Nov 27, 2025

Uh oh!

mmdbhs commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mmdbhs commented Nov 25, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Nov 25, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

FlamingoPg Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

mmdbhs Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

FlamingoPg commented Nov 25, 2025

Uh oh!

mmdbhs commented Nov 27, 2025

Uh oh!

mmdbhs commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants