[Performance 6/6] Add --precision half option to avoid casting during inference#15820
[Performance 6/6] Add --precision half option to avoid casting during inference#15820AUTOMATIC1111 merged 3 commits intoAUTOMATIC1111:devfrom
Conversation
|
will force-fp16 mode conflicting with fp8 unet? |
|
I'm not sure if this is related to using dynamic lora weight wonder if it's related to this |
|
Enabling |
|
Found the offending line. In h = x.type(self.dtype)while in # h = x.type(self.dtype)
h = x
I don't know if it's the appropriate place to put it, but setting |
something like this? this does fixed dtype mismatch error |
Thanks for digging out the solution! Verified that the solution works. |
|
I'm still getting the following runtime error with both SDXL and SD15 models: Seems to be related to |
Can you share what model you used? I am not sure if you load a full precision model, whether weights are casted to fp16 before inference. The models I tested are already half precision. |
Sure, I tried a few:
Same error regardless of checkpoint. It probably has something to do with my environment, although I'm not sure what yet. Here's a bit more context:
I'll write back if I figure out the cause. |
|
I’ve tested this on a 6700 XT and there is a performance improvement. However, I think that this should not disallow setting |
|
another report of fp8 issue |
|
with using FP16 VAE i got almost double speed compared to no-half-vae nice FP16 VAE is mandatory |
Description
According to lllyasviel/stable-diffusion-webui-forge#716 (comment) , casting during inference is a main source of performance overhead. ComfyUI and Forge by default does fp16 inference without any casting, i.e. all tensors are fp16 before inference. The performance overhead is ~50ms/it.
This PR adds an option
--precision halfto disable autocasting and use all fp16 values during inference.Screenshots/videos:
Checklist: