使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 #99

juntaosun · 2025-06-26T12:49:33Z

我尝试第一次给 DreamO 贡献一份小小的 PR~ 😉

【特色内容】
👉️ 新增 nunchaku 支持，可达 2~4 倍高速推理，低显存 <~7GB 占用，三个参考图。
👉️ 现在，它可在消费级GPU比如 >8GB 显卡上畅玩，祝大家玩的开心！🎉
👉️ 推理，仅需数十秒，即可生成 1024x1024 图像！（基于 NVIDIA RTX 3080 实测）

【主要变化】本次 PR 改动如下，兼容 v1 或最新 v1.1 模型：
（1）dreamo_pipeline.py ：新增兼容 load_dreamo_model_nunchaku
（2）dreamo_generator.py：新增，负责核心加载或量化逻辑处理。
（3）app.py ：用户 webUI 界面代码更整洁，实时推理进度展示。
（4）requirements.txt ，将依赖升级到 diffusers==0.32.2

【显存占用】不同量化对显存的影响，对比数据：

--quant	VRAM	mark
default	24GB	⚠️
int8	16GB	⚠️
nunchaku	6.5GB	✅

app.py 启动参数：

--quant ：default or int8 or nunchaku

【安装说明】Nunchaku 最新版本安装详见：
https://github.com/mit-han-lab/nunchaku

【运行说明】
运行 app.py，打开 webui 页面，使用 nunchaku 快速推理。

parser.add_argument('--quant', type=str, default='nunchaku', help='Quantize to use: default, int8, nunchaku')

【Featured Content】
Added nunchaku support, up to 2~4 times faster inference, low video memory usage <~7GB, three reference images.
Now, it can be played on consumer-grade GPUs such as >8GB graphics cards, I wish you all a happy game! 🎉

【Installation instructions】For the latest version of Nunchaku installation, see:
https://github.com/mit-han-lab/nunchaku

app.py startup parameters:

--quant ：default or int8 or nunchaku

【Running instructions】
Run app.py, open the webui page, and use nunchaku for fast inference.

a person playing guitar in the street，lookat the viewer.

juntaosun · 2025-06-26T14:06:02Z

@ToTheBeginning 最初调试遇到些问题，花了一下午的时间，总算整理好了~请查阅吧

ToTheBeginning · 2025-06-26T14:18:02Z

@juntaosun 你方便测下如果用和不用apply_cache_on_pipe，速度差异有多少吗？我昨天跑了下发现这个函数会报错

juntaosun · 2025-06-26T15:08:09Z

@ToTheBeginning 如果您遇到OOM错误，可以尝试在先调用apply_cache_on_pipe后，再调用pipe.enable_model_cpu_offload()。交换一下代码的执行顺序。这是一个类似 TeaCache 的缓存技术。
附 3080 本地测试：对于 12 步推理来说，它的加速在1.2倍左右。（对于 30 步以上的推理，可达到2倍加速）

建议拉这个最新的 PR 测试一下，我的机器上测试 apply_cache_on_pipe 正常。

ToTheBeginning · 2025-06-26T16:05:39Z

Thanks again for your contribution. 👍

ToTheBeginning · 2025-06-26T16:06:49Z

@juntaosun 我修改了部分代码，在我这边环境测下来没问题，你可以拉取下看下在你那边环境是不是也正常～

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用

8fa379f

juntaosun mentioned this pull request Jun 26, 2025

（已过时）使用 Nunchaku 实现2~4倍高速推理 <~7GB 低显存占用 #97

Closed

Add comment header

3ad62b3

refactor

c7d5305

ToTheBeginning merged commit e4ff34e into bytedance:main Jun 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 #99

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 #99

Uh oh!

juntaosun commented Jun 26, 2025 •

edited

Loading

Uh oh!

juntaosun commented Jun 26, 2025

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

juntaosun commented Jun 26, 2025 •

edited

Loading

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 #99

使用 Nunchaku 实现2~4倍高速推理 <7GB 低显存占用 #99

Uh oh!

Conversation

juntaosun commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juntaosun commented Jun 26, 2025

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

juntaosun commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

ToTheBeginning commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juntaosun commented Jun 26, 2025 •

edited

Loading

juntaosun commented Jun 26, 2025 •

edited

Loading