Add ernie image#13432
Conversation
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks for the PR!
i left some feedbacks
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks!
i left a few more comments
yiyixuxu
left a comment
There was a problem hiding this comment.
thanks! left two small comments
let's merge this soon
|
@claude can you do a review here also? please keep these 3 note in mind as well during your review
|
|
I'll analyze this and get back to you. |
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
can you run |
|
|
||
| # Initialize latents | ||
| if latents is None: | ||
| latents = torch.randn( |
There was a problem hiding this comment.
I think this should probably use the diffusers randn_tensor. Currently it will fail with a cpu generator which is needed for a consistent seed on different systems. ref
| return text_bth, lens | ||
|
|
||
| @torch.no_grad() | ||
| def __call__( |
There was a problem hiding this comment.
Would it be possible to add support for prompt_embeds and negative_prompt_embeds which would bypass needing to encode the prompt? Ref
diffusers/src/diffusers/pipelines/z_image/pipeline_z_image.py
Lines 309 to 310 in 251676d
|
@bot /style |
|
Style bot fixed some files and pushed the changes. |
|
|
||
| def rope(pos: torch.Tensor, dim: int, theta: int) -> torch.Tensor: | ||
| assert dim % 2 == 0 | ||
| scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim |
There was a problem hiding this comment.
Quick question: is float64 mandatory here?
I experimented with float32 and image generation succeeded. On some GPU backends, float64 is not well supported; that can cause silent numerical issues or cryptic runtime errors.
Could the developers consider changing this to float32 so as to support more GPU backends?
There was a problem hiding this comment.
I can't believe they've done this again, the number of times issues have been raised about float64 being in a rope implementation you think there would be an automatic check by now. It not strictly necessary and to breaks MPS and NPU compatibility.
At least someone else raised age issue this time
https://github.com/huggingface/diffusers/pull/13464/changes
* Add ERNIE-Image * Update doc * Update doc * Change from Custom-Attention to Diffusers Style Attention * Change from Custom-Attention to Diffusers Style Attention * 兼容SGLang * 优化PE模块的加载与offload策略 * 更新Doc文件与config配置相关内容 * Fix官方反馈的内容 * 根据官方建议优化代码 * Update code * update * update * Apply style fixes * update * update * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

What does this PR do?
We have introduced a new text-to-image model called ERNIE-Image, which will soon be open-sourced to the community. This PR includes the model architecture definition, the pipeline, as well as the related documentation and test files.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.