Skip to content

[IPEX] Slice SDPA into smaller chunks#14353

Merged
AUTOMATIC1111 merged 2 commits intoAUTOMATIC1111:devfrom
Nuullll:ipex-sdpa
Jan 1, 2024
Merged

[IPEX] Slice SDPA into smaller chunks#14353
AUTOMATIC1111 merged 2 commits intoAUTOMATIC1111:devfrom
Nuullll:ipex-sdpa

Conversation

@Nuullll
Copy link
Copy Markdown
Contributor

@Nuullll Nuullll commented Dec 18, 2023

Description

Slice scaled_dot_product_attention into smaller chunks so that the SDPA of each chunk wouldn't request any allocation larger than the given limit.

This was initially designed to work around the 4GB single block allocation limitation of Intel compute-runtime (RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB! Tried to allocate 8.00 GiB). Then I found out that setting a smaller limit would reduce the VRAM footprint during SDPA calculation. The current limit (VRAM // 8) was tuned for Intel Arc A770 16G and A750 8G without sacrificing performance.

With this change, A770 16G can generate 512x512 of batch size 32 and A750 8G can generate batch size 16

Test results:

Common settings: --use-ipex --opt-sdp-attention, txt2img, DPM++ 2M Karras, 20 steps, 512x512 resolution, batch count = 5

  • Effective it/s == Batch size * Batch count * Steps / Total time taken
  • RE in the table refers to RuntimeError: Current platform can NOT allocate memory block with size larger than 4GB!
  • OOM in the table refers to RuntimeError: Allocation is out of device memory on current platform.

A770 16G (connected with two monitors [taking up ~1.1GB VRAM])

Batch Size Before: Peak VRAM (GB) After: Peak VRAM (GB) Delta Before: Effective it/s After: Effective it/s Delta
1 6.8 6.6 -2.9% 5.95 6.45 +8.4%
2 8.5 8.4 -1.2% 7.66 7.84 +2.3%
4 11.6 11.1 -4.3% 8.95 9.17 +2.5%
8 15.9 13.9 -12.6% 4.46 10.74 +140.8%
16 RE 15.1 - - 11.40 -
32 RE 15.5 - - 11.24 -

A750 8G (not connected with monitors)

Batch Size Before: Peak VRAM (GB) After: Peak VRAM (GB) Delta Before: Effective it/s After: Effective it/s Delta
1 5.7 5.6 -1.8% 5.49 6.06 +10.4%
2 7.4 6.8 -8.1% 7.22 7.55 +4.6%
4 7.9 7.5 -5.1% 6.81 8.68 +27.5%
8 OOM 7.9 - - 9.47 -
16 RE 7.9 - - 9.15 -
32 RE OOM - - - -

Screenshots/videos:

image

Checklist:

@AUTOMATIC1111 AUTOMATIC1111 merged commit cba6fba into AUTOMATIC1111:dev Jan 1, 2024
@w-e-w w-e-w mentioned this pull request Feb 17, 2024
@pawel665j pawel665j mentioned this pull request Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants