Skip to content

[Docs] Add quantization docs#3410

Merged
zhyncs merged 27 commits into
sgl-project:mainfrom
Edenzzzz:quantization_docs
Feb 9, 2025
Merged

[Docs] Add quantization docs#3410
zhyncs merged 27 commits into
sgl-project:mainfrom
Edenzzzz:quantization_docs

Conversation

@Edenzzzz

@Edenzzzz Edenzzzz commented Feb 8, 2025

Copy link
Copy Markdown
Contributor

Motivation

Re-opens #3253 with reviews addressed.

Modifications

Checklist

  • Format your code according to the Code Formatting with Pre-Commit.
  • Add unit tests as outlined in the Running Unit Tests.
  • Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
  • Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
  • For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

@Edenzzzz

Edenzzzz commented Feb 8, 2025

Copy link
Copy Markdown
Contributor Author

cc @zhaochenyang20

--port 30000 --host 0.0.0.0
```

Our team is working on supporting more online quantization methods. We will soon support methods including but not limited to `["awq", "gptq", "marlin", "gptq_marlin", "awq_marlin", "bitsandbytes", "gguf"]`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means online quantization? Loading offline awq weights is already supported.

@zhaochenyang20

Copy link
Copy Markdown
Collaborator

Thanks. I will give credit to you, james and fan.

@zhaochenyang20 zhaochenyang20 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move it to reference and add it in index.rst.

@Edenzzzz

Edenzzzz commented Feb 9, 2025

Copy link
Copy Markdown
Contributor Author

@zhaochenyang20 Added

Comment thread docs/references/quantization.md Outdated
Comment thread docs/references/quantization.md Outdated
Comment thread docs/references/quantization.md Outdated
Comment thread docs/references/quantization.md Outdated
@zhyncs zhyncs merged commit 0af1d23 into sgl-project:main Feb 9, 2025
@Edenzzzz Edenzzzz deleted the quantization_docs branch February 9, 2025 18:19
@zhyncs

zhyncs commented Feb 9, 2025

Copy link
Copy Markdown
Collaborator

FYI In the upcoming release, we will default to using sgl-kernel's W8A8 Int8 and FP8 instead of vLLM's W8A8. We have achieved best performance across on all sm80, sm89 and sm90.

@zhaochenyang20

Copy link
Copy Markdown
Collaborator

Great. Wait, we need to change this a bit

timethink pushed a commit to timethink/sglang that referenced this pull request Mar 9, 2025
Co-authored-by: yinfan98 <1106310035@qq.com>
0826joyce pushed a commit to 0826joyce/sglang-perf-opt that referenced this pull request May 19, 2026
Co-authored-by: yinfan98 <1106310035@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants