[js/web] JSEP Attention & MultiHeadAttention #17742

dakenf · 2023-09-29T15:19:29Z

Description

This is a narrow implementation of Attention/MultiHeadAttention as it does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

guschmue · 2023-09-29T17:24:11Z

/azp run ONNX Runtime Web CI Pipeline

guschmue · 2023-09-29T17:24:20Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

azure-pipelines · 2023-09-29T17:24:23Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2023-09-29T17:24:26Z

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2023-09-29T17:24:54Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2023-09-29T17:24:59Z

Azure Pipelines successfully started running 9 pipeline(s).

satyajandhyala · 2023-09-29T21:07:17Z

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

dakenf · 2023-09-30T01:28:15Z

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

yeah. its been a while since i've submitted a new op and forgot about that. also forgot to run format after moving tests from my working branch

fs-eire · 2023-09-30T07:02:13Z

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

yeah. its been a while since i've submitted a new op and forgot about that. also forgot to run format after moving tests from my working branch

please update comments in https://github.com/microsoft/onnxruntime/blob/main/js/web/script/generate-webgpu-operator-md.ts#L10 since it's a partial implementation

dakenf · 2023-10-03T16:56:41Z

please update comments in https://github.com/microsoft/onnxruntime/blob/main/js/web/script/generate-webgpu-operator-md.ts#L10 since it's a partial implementation

done

guschmue · 2023-10-03T19:36:29Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-10-03T19:36:42Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2023-10-03T19:36:46Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2023-10-03T19:36:59Z

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2023-10-03T19:37:28Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2023-10-03T19:37:29Z

Azure Pipelines successfully started running 9 pipeline(s).

dakenf · 2023-10-03T21:19:03Z

fixed ONNX Runtime Web CI Pipeline error with unused variable

guschmue · 2023-10-03T21:27:06Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-10-03T21:27:17Z

Azure Pipelines successfully started running 1 pipeline(s).

dakenf · 2023-10-06T15:53:12Z

I've fixed errors here and in LayerNorm PR

Also, I've managed to load SDXL in the browser. So once I update the pipeline code, will come back with updates to Attention (if it requires something not yet implemented) or will start to annoy you with 64bit PR :)

guschmue · 2023-10-09T18:27:46Z

Awesome if you can SDXL. wasm64 might be a bit of pain but long term not avoidable and a bunch of people badly want it. I think fp16 together with the onnx external data format will go a long way at one point we need 64bit.

guschmue · 2023-10-09T18:28:03Z

/azp run ONNX Runtime Web CI Pipeline

azure-pipelines · 2023-10-09T18:28:16Z

Azure Pipelines successfully started running 1 pipeline(s).

guschmue · 2023-10-09T19:08:25Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

guschmue · 2023-10-09T19:08:34Z

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2023-10-09T19:09:01Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2023-10-09T19:09:08Z

Azure Pipelines successfully started running 9 pipeline(s).

dakenf · 2023-10-09T19:13:40Z

Awesome if you can SDXL. wasm64 might be a bit of pain but long term not avoidable and a bunch of people badly want it. I think fp16 together with the onnx external data format will go a long way at one point we need 64bit.

I've already loaded it (both fp32 and fp16) but having some issues with either onnx export or pipeline code. Getting NaNs after unet run. Most likely will have time to resolve closer to the next week

dakenf · 2023-10-09T20:41:23Z

Anyway, i can maintain my own package with 64bit build. Since my goal is to have diffusers.js library, not specific implementation details. But right now my dev branch and upstream are very diverged. If you agree to merge things to support 64bit flags then it would make everything so much easier. I can fill some separate PRs, just let me know what you would like to have in upstream and how we can make it compatible

js/web/lib/wasm/jsep/webgpu/ops/attention.ts

js/web/lib/wasm/jsep/webgpu/ops/multi-head-attentiion.ts

satyajandhyala · 2023-11-15T20:19:41Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

satyajandhyala · 2023-11-15T20:20:09Z

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

azure-pipelines · 2023-11-15T20:20:22Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2023-11-15T20:20:35Z

Azure Pipelines successfully started running 6 pipeline(s).

fs-eire · 2023-11-16T00:48:16Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

fs-eire · 2023-11-16T00:48:18Z

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Android CI Pipeline

fs-eire · 2023-11-16T00:48:20Z

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

azure-pipelines · 2023-11-16T00:48:34Z

Azure Pipelines successfully started running 2 pipeline(s).

azure-pipelines · 2023-11-16T00:48:52Z

Azure Pipelines successfully started running 9 pipeline(s).

azure-pipelines · 2023-11-16T00:48:56Z

Azure Pipelines successfully started running 10 pipeline(s).

### Description This is a narrow implementation of Attention/MultiHeadAttention as it does not support: a. inputs 5-7 for MHA b. packed QKV/KV c. past/present d. attention mask But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro VRAM usage is about 8gb if you don't use img2img Going to focus on SDXL now --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>

Attention & MultiHeadAttention

921b904

guschmue added the ep:WebGPU ort-web webgpu provider label Sep 29, 2023

Fixed format, updated docs

875de3b

Merge branch 'main' into jsep-attention

f9e3703

Build fix

37aa538

Release build fix

8a14cbf

fs-eire reviewed Oct 11, 2023

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/attention.ts Outdated Show resolved Hide resolved

fs-eire reviewed Oct 11, 2023

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/attention.ts Outdated Show resolved Hide resolved

dakenf and others added 3 commits October 16, 2023 01:40

Merge branch 'upstream' into jsep-attention

86dc9ff

getRunData changes

d813991

Merge remote-tracking branch 'origin/main' into jsep-attention

543a453

github-advanced-security bot found potential problems Nov 8, 2023

View reviewed changes

js/web/lib/wasm/jsep/webgpu/ops/multi-head-attentiion.ts Fixed Show fixed Hide fixed

js/web/lib/wasm/jsep/webgpu/ops/multi-head-attentiion.ts Fixed Show fixed Hide fixed

fs-eire added 2 commits November 15, 2023 14:09

Merge remote-tracking branch 'origin/main' into jsep-attention

955d1a8

fix build break

2735eaf

satyajandhyala approved these changes Nov 17, 2023

View reviewed changes

fs-eire merged commit fac3e33 into microsoft:main Nov 17, 2023

[js/web] JSEP Attention & MultiHeadAttention #17742

[js/web] JSEP Attention & MultiHeadAttention #17742

Uh oh!

Conversation

dakenf commented Sep 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

guschmue commented Sep 29, 2023

Uh oh!

guschmue commented Sep 29, 2023

Uh oh!

azure-pipelines bot commented Sep 29, 2023

Uh oh!

guschmue commented Sep 29, 2023

Uh oh!

azure-pipelines bot commented Sep 29, 2023

Uh oh!

azure-pipelines bot commented Sep 29, 2023

Uh oh!

satyajandhyala commented Sep 29, 2023

Uh oh!

dakenf commented Sep 30, 2023

Uh oh!

fs-eire commented Sep 30, 2023

Uh oh!

dakenf commented Oct 3, 2023

Uh oh!

guschmue commented Oct 3, 2023

Uh oh!

azure-pipelines bot commented Oct 3, 2023

Uh oh!

guschmue commented Oct 3, 2023

Uh oh!

guschmue commented Oct 3, 2023

Uh oh!

azure-pipelines bot commented Oct 3, 2023

Uh oh!

azure-pipelines bot commented Oct 3, 2023

Uh oh!

dakenf commented Oct 3, 2023

Uh oh!

guschmue commented Oct 3, 2023

Uh oh!

azure-pipelines bot commented Oct 3, 2023

Uh oh!

dakenf commented Oct 6, 2023

Uh oh!

guschmue commented Oct 9, 2023

Uh oh!

guschmue commented Oct 9, 2023

Uh oh!

azure-pipelines bot commented Oct 9, 2023

Uh oh!

guschmue commented Oct 9, 2023

Uh oh!

guschmue commented Oct 9, 2023

Uh oh!

azure-pipelines bot commented Oct 9, 2023

Uh oh!

azure-pipelines bot commented Oct 9, 2023

Uh oh!

dakenf commented Oct 9, 2023

Uh oh!

dakenf commented Oct 9, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

satyajandhyala commented Nov 15, 2023

Uh oh!

satyajandhyala commented Nov 15, 2023

Uh oh!

azure-pipelines bot commented Nov 15, 2023

Uh oh!

azure-pipelines bot commented Nov 15, 2023

Uh oh!

fs-eire commented Nov 16, 2023

Uh oh!

dakenf commented Sep 29, 2023 •

edited

Loading