[Diffusion][NPU] Add support for Hunyuan3D by e-martirosian · Pull Request #20352 · sgl-project/sglang

e-martirosian · 2026-03-11T09:46:27Z

Motivation

This PR adds NPU support to the Hunyuan3D pipeline.

Modifications

Added support for building a custom rasterizer for CPU only.
Set to use CPU in rasterize when input tensors are on NPU.
Converted image_tensors to float32 (previously used double) in _run_delight - double is not supported on NPU.
Fixed device propagation in MeshRender.

Accuracy Tests

GPU

Before:

After:

NPU

Before։ pipeline failed.

After:

Benchmarking and Profiling

GPU

The performance difference is within the error margin.

Devices: 1 x Nvidia A10
Command: sglang generate --model-path tencent/Hunyuan3D-2 --image-path ./assets/demo.png

Before: 546.98 seconds
After: 545.57 seconds

NPU

Devices: one chip of Ascend A3
Command: sglang generate --model-path tencent/Hunyuan3D-2 --image-path ./assets/demo.png

Generated in 518.22 seconds

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-03-11T10:11:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces NPU (Neural Processing Unit) support for the Hunyuan3D pipeline by enhancing the custom rasterizer to handle different device types (CPU/CUDA) and ensuring proper tensor device placement. It also refines image tensor handling and renderer initialization for better compatibility and robustness.

Highlights

Conditional CUDA Kernel Loading: The custom rasterizer's CUDA kernel (rasterizer_gpu.cu) is now conditionally included in the build process only when torch.cuda.is_available() is true, improving build flexibility and reducing dependencies for CPU-only environments.
Enhanced Device Handling in Rasterizer: The rasterize function now explicitly manages tensor device placement. Input tensors are moved to the appropriate processing device (CPU or CUDA) and, if processed on a non-CUDA device, the results are moved back to the original device, ensuring correct operation across different hardware configurations.
Conditional Compilation for C++ Rasterizer: Conditional compilation directives (#ifdef __CUDACC__) have been added to the C++ rasterizer implementation (rasterizer.cpp) to correctly dispatch between CPU and GPU rasterization functions based on whether the code is being compiled with a CUDA compiler.
Macro Definitions for Non-CUDA Builds: The __host__ and __device__ macros are now conditionally defined in rasterizer.h when __CUDACC__ is not present, preventing compilation errors in environments without CUDA support.
Explicit Float Casting for Image Tensors: Image tensors in the hunyuan3d_paint stage are now explicitly cast to float type during conversion from NumPy arrays, ensuring consistent data types for subsequent operations.
Device Assignment for Mesh Renderer: The MeshRender constructor now accepts a device argument, allowing for explicit device assignment during renderer initialization, which is crucial for multi-device environments.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

python/sglang/multimodal_gen/csrc/render/hunyuan3d_rasterizer/init.py
- Updated _load_custom_rasterizer to conditionally include rasterizer_gpu.cu in sources based on torch.cuda.is_available().
- Modified rasterize function to determine the processing device, move input tensors (pos, tri, clamp_depth) to that device, and move output tensors (findices, barycentric) back to the original device if not CUDA.
python/sglang/multimodal_gen/csrc/render/hunyuan3d_rasterizer/rasterizer.cpp
- Wrapped the device-checking logic for calling rasterize_image_cpu or rasterize_image_gpu within an #ifdef __CUDACC__ block, falling back to rasterize_image_cpu if __CUDACC__ is not defined.
python/sglang/multimodal_gen/csrc/render/hunyuan3d_rasterizer/rasterizer.h
- Added conditional definitions for __host__ and __device__ macros when __CUDACC__ is not defined, to ensure compatibility during compilation.
python/sglang/multimodal_gen/runtime/pipelines_core/stages/hunyuan3d_paint.py
- Modified _run_delight to explicitly cast image_tensor to float type when converting NumPy arrays to PyTorch tensors.
- Added device=self.device argument to the MeshRender constructor in _init_renderer.

Activity

No specific activity (comments, reviews, or CI status) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully adds support for non-CUDA devices (CPU) to the Hunyuan3D pipeline, utilizing conditional compilation and proper device placement for tensors. A comprehensive security audit found no significant security vulnerabilities, confirming that the changes primarily focus on device compatibility and type consistency without introducing new security risks. The implementation is well-executed, and no further improvements are suggested.

ssshinigami · 2026-03-18T09:11:08Z

please add more description and generated results to show it works and latency of generation

ssshinigami

LGTM

ping1jing2 · 2026-03-20T06:30:02Z

    """Rasterize mesh to get face indices and barycentric coordinates."""
-    kernel = _load_custom_rasterizer()
+    device = "cpu" if pos.device.type == "npu" else pos.device.type
+    kernel = _load_custom_rasterizer(device == "cuda")


will it also work for other hardware backends such as AMD?

We only check for NPU and run the custom kernel on CPU. This is intentional — we don’t expect it to work on other backends. Developers supporting other hardware can decide to implement this custom kernel for their backend or just run this part on CPU. Otherwise, they might not even notice this part to optimize it.

ping1jing2 · 2026-03-20T06:34:29Z

/tag-and-rerun-ci

ping1jing2 · 2026-03-21T07:22:24Z

/rerun-failed-ci

ping1jing2 · 2026-03-23T06:39:45Z

/rerun-failed-ci

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

add NPU support for Hunyuan3D

06b01f6

github-actions Bot added the diffusion SGLang Diffusion label Mar 11, 2026

gemini-code-assist Bot reviewed Mar 11, 2026

View reviewed changes

VDV1985 reviewed Mar 12, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/csrc/render/hunyuan3d_rasterizer/__init__.py

Comment thread python/sglang/multimodal_gen/csrc/render/hunyuan3d_rasterizer/__init__.py Outdated

e-martirosian force-pushed the hunyuan3d_npu_support branch from 41d0a95 to a057758 Compare March 16, 2026 07:36

add argument --rasterization-on-cpu

6507021

e-martirosian force-pushed the hunyuan3d_npu_support branch from 8ac04c0 to 6507021 Compare March 16, 2026 09:10

ssshinigami suggested changes Mar 18, 2026

View reviewed changes

Comment thread python/sglang/multimodal_gen/configs/pipeline_configs/base.py Outdated

ping1jing2 mentioned this pull request Mar 18, 2026

[Roadmap] [NPU] Sglang Diffusion on Ascend #18967

Open

89 tasks

ping1jing2 self-assigned this Mar 18, 2026

delete argument --rasterization-on-cpu

1bc01a3

ping1jing2 changed the title ~~[NPU] Add support for Hunyuan3D~~ [Diffusion][NPU] Add support for Hunyuan3D Mar 18, 2026

ssshinigami approved these changes Mar 19, 2026

View reviewed changes

e-martirosian marked this pull request as ready for review March 19, 2026 14:03

e-martirosian requested review from mickqian, ping1jing2 and yhyang201 as code owners March 19, 2026 14:03

ping1jing2 reviewed Mar 20, 2026

View reviewed changes

github-actions Bot added the run-ci label Mar 20, 2026

Merge branch 'main' into hunyuan3d_npu_support

cf621d3

mickqian approved these changes Mar 24, 2026

View reviewed changes

sglang-npu-bot merged commit 9f4d8ac into sgl-project:main Mar 24, 2026
108 of 122 checks passed

adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

236af6b

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

adityavaid pushed a commit to adityavaid/sglang that referenced this pull request Mar 24, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

91d824e

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

0-693 pushed a commit to 0-693/sglang that referenced this pull request Mar 25, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

a38a146

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

e098456

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

johnnycxm pushed a commit to johnnycxm/sglang that referenced this pull request Mar 25, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

6dfa775

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

183a350

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

[Diffusion][NPU] Add support for Hunyuan3D (sgl-project#20352)

7f8cfd4

Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Diffusion][NPU] Add support for Hunyuan3D#20352

[Diffusion][NPU] Add support for Hunyuan3D#20352
sglang-npu-bot merged 4 commits intosgl-project:mainfrom
e-martirosian:hunyuan3d_npu_support

e-martirosian commented Mar 11, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ssshinigami commented Mar 18, 2026

Uh oh!

ssshinigami left a comment

Uh oh!

Uh oh!

ping1jing2 Mar 20, 2026

Uh oh!

e-martirosian Mar 20, 2026

Uh oh!

ping1jing2 commented Mar 20, 2026

Uh oh!

ping1jing2 commented Mar 21, 2026

Uh oh!

ping1jing2 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

e-martirosian commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

GPU

NPU

Benchmarking and Profiling

GPU

NPU

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ssshinigami commented Mar 18, 2026

Uh oh!

ssshinigami left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ping1jing2 Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

e-martirosian Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

ping1jing2 commented Mar 20, 2026

Uh oh!

ping1jing2 commented Mar 21, 2026

Uh oh!

ping1jing2 commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

e-martirosian commented Mar 11, 2026 •

edited

Loading