[Performance] LDM optimization patches by drhead · Pull Request #15824 · AUTOMATIC1111/stable-diffusion-webui

drhead · 2024-05-17T16:16:17Z

Description

Change 1: Timestep Embedding Patch

Fixes a blocking op in the timestep embedding. It was creating a tensor on CPU and then moving it to GPU, which would force a sync every step.
Combined with the other performance PRs (mine and HCL's), Torch's dispatch queue should be completely unblocked (until extensions with similar problems mess it up). This will allow near constant 100% GPU usage.

Change 2: SpatialTransformer.forward einops removal

Changes the function to use native torch reshape/view/permute ops and removes the .contiguous() call.
Prevents 32 calls to aten::copy_ and void at::native::elementwise_kernel<128, 4, at::nati... per forward pass (SD 1.5). Speedup seems to be around 6-8 ms per forward, but my profiler is being a little inconsistent with the timing (512x512, batch 4, overclocked 3090)

Checklist:

I have read contributing wiki page
I have performed a self-review of my own code
My code follows the style guidelines
My code passes tests

drhead · 2024-05-17T16:22:07Z

I think #18620 might need to be merged before tests will pass on this.

w-e-w · 2024-05-17T16:44:25Z

we are currently on [Performance] LDM optimization patches #15824

so we need to wait 2769 new posts to merge this 🙃

drhead · 2024-05-17T16:46:32Z

Upon further review I think it would be sufficient for #15820 to be merged first lol

drhead · 2024-05-17T17:19:55Z

Added another patch, and it passes tests now.

Patch timestep embedding to create tensor on-device

53d6708

drhead requested a review from AUTOMATIC1111 as a code owner May 17, 2024 16:16

Add transformer forward patch

cc9ca67

drhead changed the title ~~Patch timestep embedding to create tensor on-device~~ LDM optimization patches May 17, 2024

drhead changed the title ~~LDM optimization patches~~ [Performance] LDM optimization patches May 21, 2024

AUTOMATIC1111 approved these changes Jun 8, 2024

View reviewed changes

Merge branch 'dev' into patch-4

ebfc9f6

AUTOMATIC1111 merged commit 93b53dc into AUTOMATIC1111:dev Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] LDM optimization patches#15824

[Performance] LDM optimization patches#15824
AUTOMATIC1111 merged 3 commits intoAUTOMATIC1111:devfrom
drhead:patch-4

drhead commented May 17, 2024 •

edited

Loading

Uh oh!

drhead commented May 17, 2024

Uh oh!

w-e-w commented May 17, 2024 •

edited

Loading

Uh oh!

drhead commented May 17, 2024

Uh oh!

drhead commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

drhead commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

drhead commented May 17, 2024

Uh oh!

w-e-w commented May 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drhead commented May 17, 2024

Uh oh!

drhead commented May 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

drhead commented May 17, 2024 •

edited

Loading

w-e-w commented May 17, 2024 •

edited

Loading