Split toml files for sglang package#10162
Split toml files for sglang package#10162ZailiWang wants to merge 2 commits intosgl-project:mainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @ZailiWang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request refactors the dependency management for the sglang package by introducing separate pyproject.toml files for different hardware platforms (CPU, HIP, XPU, HPU, NPU). This change aims to simplify the build process for per-device wheels and improve clarity by isolating platform-specific dependencies. It also involves updating relevant build scripts and documentation to align with the new structure.
Highlights
- Modularized Dependency Management: The
sglangpackage'spyproject.tomlfiles have been split into device-specific configurations (e.g.,pyproject_cpu.toml,pyproject_hip.toml) to streamline per-device wheel building. - Simplified Default Configuration: The main
pyproject.tomlnow focuses on CUDA-only builds, with other device-specific dependencies moved to their respective.tomlfiles. - Flattened Dependencies: Hierarchical
[project.optional-dependencies]have been removed, with each.tomlfile now containing a single, flatdependencieslist (excluding test-related packages). - Documentation and Dockerfile Updates: Installation commands in documentation and Dockerfiles have been updated to reflect the new
.tomlstructure, specifically for CPU, with plans for other devices.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request effectively refactors the project's packaging configuration by splitting the pyproject.toml file into device-specific versions. This is a great step towards simplifying the build process for different hardware backends like CUDA, CPU, HIP, etc. The corresponding updates to the Dockerfile and documentation are consistent with these changes. My review includes a critical fix for the CPU installation documentation and several suggestions to improve the maintainability of the new pyproject files by sorting dependencies.
|
|
||
| # Install SGLang dependent libs, and build SGLang main package | ||
| pip install --upgrade pip setuptools | ||
| conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl |
There was a problem hiding this comment.
The installation instructions for bare metal are missing the step to install torch. Since torch is not a dependency in pyproject_cpu.toml, pip install . will not install it, and the setup will be incomplete. It's best to install torch after the conda packages and before installing sglang.
| conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl | |
| conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl | |
| pip install torch torchvision torchaudio |
| cp pyproject_cpu.toml pyproject.toml && \ | ||
| pip install . && \ | ||
| pip install torch==${VER_TORCH} torchvision==${VER_TORCHVISION} triton==${VER_TRITON} --force-reinstall && \ | ||
| cd sgl-kernel && \ | ||
| cd ../sgl-kernel && \ | ||
| cp pyproject_cpu.toml pyproject.toml && \ | ||
| pip install -v . | ||
| pip install . |
There was a problem hiding this comment.
The installation process can be slightly optimized by combining the pip install commands. This reduces the number of layers in the Docker image and can make builds slightly faster.
cp pyproject_cpu.toml pyproject.toml && \
pip install . torch==${VER_TORCH} torchvision==${VER_TORCHVISION} triton==${VER_TRITON} --force-reinstall && \
cd ../sgl-kernel && \
cp pyproject_cpu.toml pyproject.toml && \
pip install .
| "aiohttp", | ||
| "anthropic>=0.20.0", | ||
| "blobfile==3.0.0", | ||
| "build", | ||
| "compressed-tensors", | ||
| "datasets", | ||
| "decord", | ||
| "einops", | ||
| "fastapi", | ||
| "hf_transfer", | ||
| "huggingface_hub", | ||
| "interegular", | ||
| "IPython", | ||
| "llguidance>=0.7.11,<0.8.0", | ||
| "modelscope", | ||
| "msgspec", | ||
| "ninja", | ||
| "numpy", | ||
| "openai==1.99.1", | ||
| "openai-harmony==0.0.4", | ||
| "orjson", | ||
| "outlines==0.1.11", | ||
| "packaging", | ||
| "partial_json_parser", | ||
| "petit_kernel==0.0.2", | ||
| "pillow", | ||
| "prometheus-client>=0.20.0", | ||
| "psutil", | ||
| "pybase64", | ||
| "pydantic", | ||
| "pynvml", | ||
| "python-multipart", | ||
| "pyzmq>=25.1.2", | ||
| "requests", | ||
| "scipy", | ||
| "sentencepiece", | ||
| "setproctitle", | ||
| "soundfile==0.13.1", | ||
| "timm==1.0.16", | ||
| "tiktoken", | ||
| "torch", | ||
| "torchao==0.9.0", | ||
| "tqdm", | ||
| "transformers==4.56.0", | ||
| "uvicorn", | ||
| "uvloop", | ||
| "wave-lang==3.7.0", | ||
| "xgrammar==0.1.23", |
There was a problem hiding this comment.
For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"petit_kernel==0.0.2",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torch",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"wave-lang==3.7.0",
"xgrammar==0.1.23"
| "aiohttp", | ||
| "anthropic>=0.20.0", | ||
| "blobfile==3.0.0", | ||
| "build", | ||
| "compressed-tensors", | ||
| "datasets", | ||
| "decord", | ||
| "einops", | ||
| "fastapi", | ||
| "hf_transfer", | ||
| "huggingface_hub", | ||
| "interegular", | ||
| "IPython", | ||
| "llguidance>=0.7.11,<0.8.0", | ||
| "modelscope", | ||
| "msgspec", | ||
| "ninja", | ||
| "numpy", | ||
| "openai==1.99.1", | ||
| "openai-harmony==0.0.4", | ||
| "orjson", | ||
| "outlines==0.1.11", | ||
| "packaging", | ||
| "partial_json_parser", | ||
| "pillow", | ||
| "prometheus-client>=0.20.0", | ||
| "psutil", | ||
| "pybase64", | ||
| "pydantic", | ||
| "pynvml", | ||
| "python-multipart", | ||
| "pyzmq>=25.1.2", | ||
| "requests", | ||
| "scipy", | ||
| "sentencepiece", | ||
| "setproctitle", | ||
| "soundfile==0.13.1", | ||
| "timm==1.0.16", | ||
| "tiktoken", | ||
| "torchao==0.9.0", | ||
| "tqdm", | ||
| "transformers==4.56.0", | ||
| "uvicorn", | ||
| "uvloop", | ||
| "xgrammar==0.1.23", |
There was a problem hiding this comment.
For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.23"
| "aiohttp", | ||
| "anthropic>=0.20.0", | ||
| "blobfile==3.0.0", | ||
| "build", | ||
| "compressed-tensors", | ||
| "datasets", | ||
| "decord", | ||
| "einops", | ||
| "fastapi", | ||
| "hf_transfer", | ||
| "huggingface_hub", | ||
| "interegular", | ||
| "IPython", | ||
| "llguidance>=0.7.11,<0.8.0", | ||
| "modelscope", | ||
| "msgspec", | ||
| "ninja", | ||
| "numpy", | ||
| "openai==1.99.1", | ||
| "openai-harmony==0.0.4", | ||
| "orjson", | ||
| "outlines==0.1.11", | ||
| "packaging", | ||
| "partial_json_parser", | ||
| "pillow", | ||
| "prometheus-client>=0.20.0", | ||
| "psutil", | ||
| "pybase64", | ||
| "pydantic", | ||
| "pynvml", | ||
| "python-multipart", | ||
| "pyzmq>=25.1.2", | ||
| "requests", | ||
| "scipy", | ||
| "sentencepiece", | ||
| "setproctitle", | ||
| "soundfile==0.13.1", | ||
| "timm==1.0.16", | ||
| "tiktoken", | ||
| "torchao==0.9.0", | ||
| "tqdm", | ||
| "transformers==4.56.0", | ||
| "uvicorn", | ||
| "uvloop", | ||
| "xgrammar==0.1.23", | ||
| ] |
There was a problem hiding this comment.
For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.23"
|
It seems to be too complex and too much duplicate code sharing between different files. Do you have other methods to do this? |
|
Hi @hnyls2002 , the toml split was requested by @zhyncs . It indeed introduces duplications, but it seems this is inevitable if we need to split the package configs into device specific ones. I searched official Let's keep the content of the PR as-is until we get a conclusion whether this is what we need, or we can find any better approaches. |
|
Will take care of CPU/XPU backends only |
Motivation
Split the
.tomlfiles forsglangpackage, to better facilitate the planned per-devicesgl-whlbuilding.Modifications
pyproject.tomlfile for building whls only supporting cuda, createdpyproject_[cpu|hip|xpu|hpu|npu].tomlfiles for other devices;[project.optional-dependencies]hierarchical dependencies. Only 1dependencieslist in each.tomlfile (except test related packages);Accuracy Tests
N/A
Benchmarking and Profiling
N/A
Checklist