Skip to content

Split toml files for sglang package#10162

Closed
ZailiWang wants to merge 2 commits intosgl-project:mainfrom
ZailiWang:split-toml
Closed

Split toml files for sglang package#10162
ZailiWang wants to merge 2 commits intosgl-project:mainfrom
ZailiWang:split-toml

Conversation

@ZailiWang
Copy link
Copy Markdown
Contributor

Motivation

Split the .toml files for sglang package, to better facilitate the planned per-device sgl-whl building.

Modifications

  • Simplified the default pyproject.toml file for building whls only supporting cuda, created pyproject_[cpu|hip|xpu|hpu|npu].toml files for other devices;
  • Removed the [project.optional-dependencies] hierarchical dependencies. Only 1 dependencies list in each .toml file (except test related packages);
  • Commands in the doc & Dockerfiles require updating. Updated CPU for now, may collaborate to update that of other devices with the respective admins.

Accuracy Tests

N/A

Benchmarking and Profiling

N/A

Checklist

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ZailiWang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the dependency management for the sglang package by introducing separate pyproject.toml files for different hardware platforms (CPU, HIP, XPU, HPU, NPU). This change aims to simplify the build process for per-device wheels and improve clarity by isolating platform-specific dependencies. It also involves updating relevant build scripts and documentation to align with the new structure.

Highlights

  • Modularized Dependency Management: The sglang package's pyproject.toml files have been split into device-specific configurations (e.g., pyproject_cpu.toml, pyproject_hip.toml) to streamline per-device wheel building.
  • Simplified Default Configuration: The main pyproject.toml now focuses on CUDA-only builds, with other device-specific dependencies moved to their respective .toml files.
  • Flattened Dependencies: Hierarchical [project.optional-dependencies] have been removed, with each .toml file now containing a single, flat dependencies list (excluding test-related packages).
  • Documentation and Dockerfile Updates: Installation commands in documentation and Dockerfiles have been updated to reflect the new .toml structure, specifically for CPU, with plans for other devices.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the project's packaging configuration by splitting the pyproject.toml file into device-specific versions. This is a great step towards simplifying the build process for different hardware backends like CUDA, CPU, HIP, etc. The corresponding updates to the Dockerfile and documentation are consistent with these changes. My review includes a critical fix for the CPU installation documentation and several suggestions to improve the maintainability of the new pyproject files by sorting dependencies.


# Install SGLang dependent libs, and build SGLang main package
pip install --upgrade pip setuptools
conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The installation instructions for bare metal are missing the step to install torch. Since torch is not a dependency in pyproject_cpu.toml, pip install . will not install it, and the setup will be incomplete. It's best to install torch after the conda packages and before installing sglang.

Suggested change
conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl
conda install -y libsqlite==3.48.0 gperftools tbb libnuma numactl
pip install torch torchvision torchaudio

Comment thread docker/Dockerfile.xeon
Comment on lines +40 to +45
cp pyproject_cpu.toml pyproject.toml && \
pip install . && \
pip install torch==${VER_TORCH} torchvision==${VER_TORCHVISION} triton==${VER_TRITON} --force-reinstall && \
cd sgl-kernel && \
cd ../sgl-kernel && \
cp pyproject_cpu.toml pyproject.toml && \
pip install -v .
pip install .
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The installation process can be slightly optimized by combining the pip install commands. This reduces the number of layers in the Docker image and can make builds slightly faster.

	cp pyproject_cpu.toml pyproject.toml && \
    pip install . torch==${VER_TORCH} torchvision==${VER_TORCHVISION} triton==${VER_TRITON} --force-reinstall && \
    cd ../sgl-kernel && \
    cp pyproject_cpu.toml pyproject.toml && \
    pip install .

Comment thread python/pyproject_cpu.toml
Comment thread python/pyproject_hip.toml
Comment on lines +19 to +66
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"petit_kernel==0.0.2",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torch",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"wave-lang==3.7.0",
"xgrammar==0.1.23",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.

    "aiohttp",
    "anthropic>=0.20.0",
    "blobfile==3.0.0",
    "build",
    "compressed-tensors",
    "datasets",
    "decord",
    "einops",
    "fastapi",
    "hf_transfer",
    "huggingface_hub",
    "interegular",
    "IPython",
    "llguidance>=0.7.11,<0.8.0",
    "modelscope",
    "msgspec",
    "ninja",
    "numpy",
    "openai==1.99.1",
    "openai-harmony==0.0.4",
    "orjson",
    "outlines==0.1.11",
    "packaging",
    "partial_json_parser",
    "petit_kernel==0.0.2",
    "pillow",
    "prometheus-client>=0.20.0",
    "psutil",
    "pybase64",
    "pydantic",
    "pynvml",
    "python-multipart",
    "pyzmq>=25.1.2",
    "requests",
    "scipy",
    "sentencepiece",
    "setproctitle",
    "soundfile==0.13.1",
    "timm==1.0.16",
    "tiktoken",
    "torch",
    "torchao==0.9.0",
    "tqdm",
    "transformers==4.56.0",
    "uvicorn",
    "uvloop",
    "wave-lang==3.7.0",
    "xgrammar==0.1.23"

Comment thread python/pyproject_hpu.toml
Comment on lines +19 to +63
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.23",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.

    "aiohttp",
    "anthropic>=0.20.0",
    "blobfile==3.0.0",
    "build",
    "compressed-tensors",
    "datasets",
    "decord",
    "einops",
    "fastapi",
    "hf_transfer",
    "huggingface_hub",
    "interegular",
    "IPython",
    "llguidance>=0.7.11,<0.8.0",
    "modelscope",
    "msgspec",
    "ninja",
    "numpy",
    "openai==1.99.1",
    "openai-harmony==0.0.4",
    "orjson",
    "outlines==0.1.11",
    "packaging",
    "partial_json_parser",
    "pillow",
    "prometheus-client>=0.20.0",
    "psutil",
    "pybase64",
    "pydantic",
    "pynvml",
    "python-multipart",
    "pyzmq>=25.1.2",
    "requests",
    "scipy",
    "sentencepiece",
    "setproctitle",
    "soundfile==0.13.1",
    "timm==1.0.16",
    "tiktoken",
    "torchao==0.9.0",
    "tqdm",
    "transformers==4.56.0",
    "uvicorn",
    "uvloop",
    "xgrammar==0.1.23"

Comment thread python/pyproject_npu.toml
Comment on lines +18 to +63
"aiohttp",
"anthropic>=0.20.0",
"blobfile==3.0.0",
"build",
"compressed-tensors",
"datasets",
"decord",
"einops",
"fastapi",
"hf_transfer",
"huggingface_hub",
"interegular",
"IPython",
"llguidance>=0.7.11,<0.8.0",
"modelscope",
"msgspec",
"ninja",
"numpy",
"openai==1.99.1",
"openai-harmony==0.0.4",
"orjson",
"outlines==0.1.11",
"packaging",
"partial_json_parser",
"pillow",
"prometheus-client>=0.20.0",
"psutil",
"pybase64",
"pydantic",
"pynvml",
"python-multipart",
"pyzmq>=25.1.2",
"requests",
"scipy",
"sentencepiece",
"setproctitle",
"soundfile==0.13.1",
"timm==1.0.16",
"tiktoken",
"torchao==0.9.0",
"tqdm",
"transformers==4.56.0",
"uvicorn",
"uvloop",
"xgrammar==0.1.23",
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability and to avoid potential issues like duplicate entries, it's a good practice to keep the list of dependencies sorted alphabetically.

    "aiohttp",
    "anthropic>=0.20.0",
    "blobfile==3.0.0",
    "build",
    "compressed-tensors",
    "datasets",
    "decord",
    "einops",
    "fastapi",
    "hf_transfer",
    "huggingface_hub",
    "interegular",
    "IPython",
    "llguidance>=0.7.11,<0.8.0",
    "modelscope",
    "msgspec",
    "ninja",
    "numpy",
    "openai==1.99.1",
    "openai-harmony==0.0.4",
    "orjson",
    "outlines==0.1.11",
    "packaging",
    "partial_json_parser",
    "pillow",
    "prometheus-client>=0.20.0",
    "psutil",
    "pybase64",
    "pydantic",
    "pynvml",
    "python-multipart",
    "pyzmq>=25.1.2",
    "requests",
    "scipy",
    "sentencepiece",
    "setproctitle",
    "soundfile==0.13.1",
    "timm==1.0.16",
    "tiktoken",
    "torchao==0.9.0",
    "tqdm",
    "transformers==4.56.0",
    "uvicorn",
    "uvloop",
    "xgrammar==0.1.23"

Comment thread python/pyproject_xpu.toml
@hnyls2002
Copy link
Copy Markdown
Collaborator

It seems to be too complex and too much duplicate code sharing between different files. Do you have other methods to do this?

@ZailiWang
Copy link
Copy Markdown
Contributor Author

Hi @hnyls2002 , the toml split was requested by @zhyncs . It indeed introduces duplications, but it seems this is inevitable if we need to split the package configs into device specific ones. I searched official pip install tutorials but failed to find anything about hierarchical config file support.

Let's keep the content of the PR as-is until we get a conclusion whether this is what we need, or we can find any better approaches.

@ZailiWang
Copy link
Copy Markdown
Contributor Author

Will take care of CPU/XPU backends only

@ZailiWang ZailiWang closed this Sep 17, 2025
@ZailiWang ZailiWang mentioned this pull request Sep 22, 2025
4 tasks
@ZailiWang ZailiWang deleted the split-toml branch September 25, 2025 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants