Add separate cuda stream for live preview VAE by drhead · Pull Request #2844 · lllyasviel/stable-diffusion-webui-forge

drhead · 2025-04-29T12:34:27Z

This adds a separate CUDA stream (if cuda streams are enabled in args) for the live preview VAE so that the live preview processing can happen in parallel with main processing, and especially so that transferring the decoded image to the CPU does not block the main processing stream (should be beneficial since it is by far the largest blocking call, but would be much more beneficial if all blocking calls were removed from the main processing loop).

Internally, this should be handled very safely. If cudastreams are disabled, nothing happens differently than before. If cuda streams are enabled, we wait for the main stream to catch up before starting VAE processing so we don't grab some intermediate garbage tensor (this only blocks the live preview thread and will never block the main thread). Since the operation to move images to the CPU is already blocking and forces a sync (albeit on the VAE stream and not the main one) we don't need to ensure this stream is synced at the end.

While doing this I also did a couple of changes to the function for converting decoded images to PIL images. Converting float to uint8 should always use rounding because the default behavior when casting float to int is to truncate, but in image processing rounding to nearest int is the standard, and I set it up to do that now. I also changed it to use inplace operations where appropriate and for the clamp/etc operations to be done on the CPU.

This doesn't support XPU streams but could presumably be easily modified for it, I would want someone else who can actually test that to implement that though.

This is how the compute overlap looks in Nsight Systems. Default Stream 7 is the main stream, Stream 21 is the VAE stream. Live preview is set to every step:

Zoomed in view on one of the decoding sections:

CUDA stream synchronize here is where the VAE thread is waiting on the main stream to finish a step before starting to decode. As you can see, the main stream continues unimpeded.

If you look very closely at the Kernel utilization at the very top, you will notice that while doing VAE processing utilization is absolutely 100%, whereas there are some small dips during other sections. This is part of the performance benefit of using the live preview VAE this way, it can use some SMs that aren't being used by the current kernel on the main processing stream.

Also notable, on this I do have all of the blocking functions removed from the code so you can see the CUDA Overhead section showing a bunch of "Command Buffer Full" events. This won't happen on this yet but it is a step towards being able to have that happen.

Add separate cuda stream for live preview VAE (lllyasviel#2844)

anon0730 · 2025-05-03T17:45:51Z

After the commit there seems to be an issue with respecting setting_show_progress_every_n_steps.
For example if you set it to update preview every 10 steps it will first show you the preview on step 10 and then update the preview every step after that instead of updating it at step 20, 30 and so on.

I can't actually seem to notice any generation speed difference between updating preview every step and updating it rarely (at least with Approx NN) even on my dated system so having this setting at 1 makes it a non-issue for me. Still, worth a mention in case it affects someone with specific hardware/settings.

drhead · 2025-05-03T18:35:43Z

After the commit there seems to be an issue with respecting setting_show_progress_every_n_steps. For example if you set it to update preview every 10 steps it will first show you the preview on step 10 and then update the preview every step after that instead of updating it at step 20, 30 and so on.

I can't actually seem to notice any generation speed difference between updating preview every step and updating it rarely (at least with Approx NN) even on my dated system so having this setting at 1 makes it a non-issue for me. Still, worth a mention in case it affects someone with specific hardware/settings.

Yeah, seems like a line got obliterated when I patched this, that's probably what did it. I'll put it back.

lllyasviel#2844

drhead added 2 commits April 29, 2025 07:49

Add separate cuda stream for live preview VAE

89afc3b

Corrected rounding + minor optimization for VAE image function

2829ed2

drhead requested a review from lllyasviel as a code owner April 29, 2025 12:34

catboxanon approved these changes May 1, 2025

View reviewed changes

catboxanon merged commit d357396 into lllyasviel:main May 1, 2025

spawner1145 added a commit to spawner1145/stable-diffusion-webui-forge that referenced this pull request May 2, 2025

Merge pull request #2 from lllyasviel/main

fa9ec8f

Add separate cuda stream for live preview VAE (lllyasviel#2844)

drhead mentioned this pull request May 3, 2025

re-add line to update sampling step in live preview step #2853

Merged

Haoming02 added a commit to Haoming02/sd-webui-forge-classic that referenced this pull request May 21, 2025

vae stream

9932192

lllyasviel#2844

drhead mentioned this pull request May 23, 2025

Remove blocking call from timestep embedding of Chroma Comfy-Org/ComfyUI#8255

Merged

lshqqytiger pushed a commit to lshqqytiger/stable-diffusion-webui-amdgpu-forge that referenced this pull request Jun 24, 2025

Add separate cuda stream for live preview VAE (lllyasviel#2844)

91b239c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add separate cuda stream for live preview VAE#2844

Add separate cuda stream for live preview VAE#2844
catboxanon merged 2 commits into
lllyasviel:mainfrom
drhead:main

drhead commented Apr 29, 2025

Uh oh!

anon0730 commented May 3, 2025 •

edited

Loading

Uh oh!

drhead commented May 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drhead commented Apr 29, 2025

Uh oh!

anon0730 commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drhead commented May 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anon0730 commented May 3, 2025 •

edited

Loading