Skip to content

Conversation

@dakenf
Copy link
Contributor

@dakenf dakenf commented Sep 29, 2023

Description

This is a narrow implementation of Attention/MultiHeadAttention as it does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

@guschmue
Copy link
Contributor

/azp run ONNX Runtime Web CI Pipeline

@guschmue
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue
Copy link
Contributor

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Sep 29, 2023
@satyajandhyala
Copy link
Contributor

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

@dakenf
Copy link
Contributor Author

dakenf commented Sep 30, 2023

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

yeah. its been a while since i've submitted a new op and forgot about that. also forgot to run format after moving tests from my working branch

@fs-eire
Copy link
Contributor

fs-eire commented Sep 30, 2023

Need to run npm run build:doc under onnxruntime\js\web to update documentation?

yeah. its been a while since i've submitted a new op and forgot about that. also forgot to run format after moving tests from my working branch

please update comments in https://github.com/microsoft/onnxruntime/blob/main/js/web/script/generate-webgpu-operator-md.ts#L10 since it's a partial implementation

@dakenf
Copy link
Contributor Author

dakenf commented Oct 3, 2023

please update comments in https://github.com/microsoft/onnxruntime/blob/main/js/web/script/generate-webgpu-operator-md.ts#L10 since it's a partial implementation

done

@guschmue
Copy link
Contributor

guschmue commented Oct 3, 2023

/azp run ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue
Copy link
Contributor

guschmue commented Oct 3, 2023

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

guschmue commented Oct 3, 2023

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@dakenf
Copy link
Contributor Author

dakenf commented Oct 3, 2023

fixed ONNX Runtime Web CI Pipeline error with unused variable

@guschmue
Copy link
Contributor

guschmue commented Oct 3, 2023

/azp run ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dakenf
Copy link
Contributor Author

dakenf commented Oct 6, 2023

I've fixed errors here and in LayerNorm PR

Also, I've managed to load SDXL in the browser. So once I update the pipeline code, will come back with updates to Attention (if it requires something not yet implemented) or will start to annoy you with 64bit PR :)

@guschmue
Copy link
Contributor

guschmue commented Oct 9, 2023

Awesome if you can SDXL. wasm64 might be a bit of pain but long term not avoidable and a bunch of people badly want it. I think fp16 together with the onnx external data format will go a long way at one point we need 64bit.

@guschmue
Copy link
Contributor

guschmue commented Oct 9, 2023

/azp run ONNX Runtime Web CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@guschmue
Copy link
Contributor

guschmue commented Oct 9, 2023

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@guschmue
Copy link
Contributor

guschmue commented Oct 9, 2023

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@dakenf
Copy link
Contributor Author

dakenf commented Oct 9, 2023

Awesome if you can SDXL. wasm64 might be a bit of pain but long term not avoidable and a bunch of people badly want it. I think fp16 together with the onnx external data format will go a long way at one point we need 64bit.

I've already loaded it (both fp32 and fp16) but having some issues with either onnx export or pipeline code. Getting NaNs after unet run. Most likely will have time to resolve closer to the next week

@dakenf
Copy link
Contributor Author

dakenf commented Oct 9, 2023

Anyway, i can maintain my own package with 64bit build. Since my goal is to have diffusers.js library, not specific implementation details. But right now my dev branch and upstream are very diverged. If you agree to merge things to support 64bit flags then it would make everything so much easier. I can fill some separate PRs, just let me know what you would like to have in upstream and how we can make it compatible

@satyajandhyala
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@satyajandhyala
Copy link
Contributor

/azp run Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@fs-eire
Copy link
Contributor

fs-eire commented Nov 16, 2023

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Nov 16, 2023

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-python-checks-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Android CI Pipeline

@fs-eire
Copy link
Contributor

fs-eire commented Nov 16, 2023

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@fs-eire fs-eire merged commit fac3e33 into microsoft:main Nov 17, 2023
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
This is a narrow implementation of Attention/MultiHeadAttention as it
does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It
reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it
takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1
Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
siweic0 pushed a commit to siweic0/onnxruntime-web that referenced this pull request May 9, 2024
### Description
This is a narrow implementation of Attention/MultiHeadAttention as it
does not support:
a. inputs 5-7 for MHA
b. packed QKV/KV
c. past/present
d. attention mask

But it works well for StableDiffusion and can be extended later. It
reduces VRAM usage as it combines many ops into few
I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it
takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1
Pro
VRAM usage is about 8gb if you don't use img2img

Going to focus on SDXL now

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants