[Diffusion][NPU] Add support for Hunyuan3D#20352
[Diffusion][NPU] Add support for Hunyuan3D#20352sglang-npu-bot merged 4 commits intosgl-project:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces NPU (Neural Processing Unit) support for the Hunyuan3D pipeline by enhancing the custom rasterizer to handle different device types (CPU/CUDA) and ensuring proper tensor device placement. It also refines image tensor handling and renderer initialization for better compatibility and robustness. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request successfully adds support for non-CUDA devices (CPU) to the Hunyuan3D pipeline, utilizing conditional compilation and proper device placement for tensors. A comprehensive security audit found no significant security vulnerabilities, confirming that the changes primarily focus on device compatibility and type consistency without introducing new security risks. The implementation is well-executed, and no further improvements are suggested.
41d0a95 to
a057758
Compare
8ac04c0 to
6507021
Compare
|
please add more description and generated results to show it works and latency of generation |
| """Rasterize mesh to get face indices and barycentric coordinates.""" | ||
| kernel = _load_custom_rasterizer() | ||
| device = "cpu" if pos.device.type == "npu" else pos.device.type | ||
| kernel = _load_custom_rasterizer(device == "cuda") |
There was a problem hiding this comment.
will it also work for other hardware backends such as AMD?
There was a problem hiding this comment.
We only check for NPU and run the custom kernel on CPU. This is intentional — we don’t expect it to work on other backends. Developers supporting other hardware can decide to implement this custom kernel for their backend or just run this part on CPU. Otherwise, they might not even notice this part to optimize it.
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
1 similar comment
|
/rerun-failed-ci |
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Co-authored-by: Elizaveta Martirosian <elizaveta.martirosian@gmail.com>
Motivation
This PR adds NPU support to the Hunyuan3D pipeline.
Modifications
rasterizewhen input tensors are on NPU.image_tensorstofloat32(previously useddouble) in_run_delight-doubleis not supported on NPU.MeshRender.Accuracy Tests
GPU
Before:
After:
NPU
Before։ pipeline failed.
After:
Benchmarking and Profiling
GPU
The performance difference is within the error margin.
Devices: 1 x Nvidia A10
Command:
sglang generate --model-path tencent/Hunyuan3D-2 --image-path ./assets/demo.pngBefore: 546.98 seconds
After: 545.57 seconds
NPU
Devices: one chip of Ascend A3
Command:
sglang generate --model-path tencent/Hunyuan3D-2 --image-path ./assets/demo.pngGenerated in 518.22 seconds
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci