[webgpu] Use pushErrorScope()/popErrorScope() once for an inference run #23438

jchen10 · 2025-01-21T06:08:13Z

The CPU walltime of waiting for PopErrorScope is non-trivial, and also validation errors are not expected to happen in Release build.

Description

Motivation and Context

The CPU walltime of waiting for PopErrorScope is non-trivial, and also validation errors are not expected to happen in Release build.

jchen10 · 2025-01-21T06:26:02Z

The wait for PopErrorScope can take around 7% wall time per my profiling, which is not negligible. Meanwhile, no need to capture validation erros in release build according to WebGPU Error Handling best practices:

validation errors occur whenever invalid inputs were given to a WebGPU call. These are consistent, predictable, and should generally not be expected during normal operation of a well formed application. They will fail in the same way on every device your code runs on, so once you’ve fixed any errors that show up during development you probably don’t need to observe them directly most of the time. An exception to that rule is if you’re consuming user-supplied assets/shaders/etc, in which case watching for validation errors while loading may be helpful.

jchen10 · 2025-01-21T06:31:16Z

@guschmue @fs-eire PTAL

fs-eire · 2025-01-21T19:45:47Z

The document https://toji.dev/webgpu-best-practices/error-handling said it half correct for ONNX Runtime.

Because the shaders are generated dynamically and it is user input based (ONNX model is considered user input), it is impossible to have 100% test coverage for all possible shaders that may run. We can never expect the code is bug free no matter how carefully we write the code. So, validation errors will happen in ORT Release build.

As a library, we want to try our best to avoid unexpected crash/abort. It is OK that a validation error causing unexpected inferencing failure, but we want to avoid crash/abort or a stale state.

My concern of this change is: Will this change cause higher possibility of potential crash (abort), or a stale state (ie. any inferencing after this failure will all fail)?

jchen10 · 2025-01-22T01:53:30Z

The document https://toji.dev/webgpu-best-practices/error-handling said it half correct for ONNX Runtime.

Because the shaders are generated dynamically and it is user input based (ONNX model is considered user input), it is impossible to have 100% test coverage for all possible shaders that may run. We can never expect the code is bug free no matter how carefully we write the code. So, validation errors will happen in ORT Release build.

As a library, we want to try our best to avoid unexpected crash/abort. It is OK that a validation error causing unexpected inferencing failure, but we want to avoid crash/abort or a stale state.

My concern of this change is: Will this change cause higher possibility of potential crash (abort), or a stale state (ie. any inferencing after this failure will all fail)?

Dawn/WebGPU was designed for Web where security is always a critical consideration. It would be a disaster if a malicious input from clients could easily cause a crash. So I am not too worried about that.

For the concern of stale state, as far as I can imagine, this change could make the validation error not to be raised to the framework immediately, so it keeps processing the next operators, which is unnecessary, as the result of inferencing is not supposed to correct. It's no harm than just a waste.

This is a trade off to be made. More validation brings more robustness, and meantime more overhead.

jchen10 · 2025-01-22T02:28:37Z

It's fine to revisit this later when we are confident enough and more concerned about the performance.

fs-eire · 2025-01-22T15:46:48Z

For the concern of stale state, as far as I can imagine, this change could make the validation error not to be raised to the framework immediately, so it keeps processing the next operators, which is unnecessary, as the result of inferencing is not supposed to correct. It's no harm than just a waste.

This makes sense. I think using pushErrorScope()/popErrorScope() once for an inference run is better idea - this should have much less perf impact and still avoid errors go into uncaptured handlers, also allows users to know whether the inference is success. What do you think?

jchen10 · 2025-01-23T01:21:04Z

For the concern of stale state, as far as I can imagine, this change could make the validation error not to be raised to the framework immediately, so it keeps processing the next operators, which is unnecessary, as the result of inferencing is not supposed to correct. It's no harm than just a waste.

This makes sense. I think using pushErrorScope()/popErrorScope() once for an inference run is better idea - this should have much less perf impact and still avoid errors go into uncaptured handlers, also allows users to know whether the inference is success. What do you think?

Good idea. It would make the best of both worlds.

jchen10 · 2025-02-04T02:03:04Z

@fs-eire Use pushErrorScope()/popErrorScope() once for an inference run. PTAL

onnxruntime/core/providers/webgpu/webgpu_kernel.h

onnxruntime/core/providers/webgpu/webgpu_context.cc

fs-eire · 2025-02-07T02:50:40Z

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline

fs-eire · 2025-02-07T02:50:42Z

/azp run Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

fs-eire · 2025-02-07T02:50:44Z

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline,CoreML CI Pipeline,Linux DNNL CI Pipeline,Linux MIGraphX CI Pipeline,Linux ROCm CI Pipeline

azure-pipelines · 2025-02-07T02:51:13Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2025-02-07T02:51:15Z

Azure Pipelines successfully started running 8 pipeline(s).

azure-pipelines · 2025-02-07T02:51:19Z

Azure Pipelines successfully started running 10 pipeline(s).

jchen10 · 2025-02-07T11:29:11Z

##[error]The job running on agent onnxruntime-Ubuntu2204-AMD-CPU 44 ran longer than the maximum time of 30 minutes. For more information, see https://go.microsoft.com/fwlink/?linkid=2077134

The "Android CI Pipeline" failure looks like a false alarm. Please help re-run the job, thanks!

snnn · 2025-02-07T15:19:20Z

/azp run Win_TRT_Minimal_CUDA_Test_CI

snnn · 2025-02-07T15:19:30Z

/azp run Android CI Pipeline

azure-pipelines · 2025-02-07T15:19:33Z

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines · 2025-02-07T15:19:42Z

Azure Pipelines successfully started running 1 pipeline(s).

…un (#23438) The CPU walltime of waiting for PopErrorScope is non-trivial, and also validation errors are not expected to happen in Release build. ### Description  ### Motivation and Context

### Description This PR is to update the win-ort-main branch to the tip main branch as of 2025-02-11. ### PR List 74c778e [WebNN EP] Automatically move input CPU tensors to ml-tensor (#23073) 3775057 use correct total length to fix static kv_cache performance (#23615) 3901e96 remove --use_vcpkg flag for Python-CUDA-Packaging-Pipeline (#23631) c610df5 Add python_requires to package metadata (#23604) 2d27d68 [QNN EP] Add QNN EP to ARM64X build targets (#23635) e666503 [webgpu] no longer need pass-in gpu adapter for custom context (#23593) af679a0 Fix logic for selecting alternate name for blob (#23617) e206950 [ARM CPU] Add fp16 mlas kernels for exp, tanh, softmax, logsoftmax, softcap (#23597) 9ba5619 Update pybind and json to the latest (#23589) c54736c Migrate iOS release pipeline to 1 ES (#23606) 3981326 Increase timeout for Windows TensorRT CI (#23625) 0274b7b fix on trtCudaVersion (#23616) 740e9ab update run CI script (#23621) 5ef1832 [WebGPU] Support PIX Capture for WebGPU EP (#23192) 0114551 Fix for C4267 warning (#23610) 002916a Validate the context_file_path before EP compile graphs (#23611) 0887e36 [webgpu] Use pushErrorScope()/popErrorScope() once for an inference run (#23438) 65008cb Auto-generated baselines by 1ES Pipeline Templates (#23603) 09e5724 [CUDA] Fix beam search of num_beams > 32 (#23599) 82840f6 Implement Flash Attention 2 for webgpu EP (#23576) a6ea57b OpenVINO EP Weights Sharing Feature (#23553) 2c2ff4a [CUDA] Fix BeamSearchTest.DummyT5WithSequenceInputIds test failure in Windows (#23596) d981b15 [webgpu/js] Optimize resize webgpu op & fix precision issues (#23591) 328a13c Enable VCPKG in more pipelines (#23590) 6728d60 [TensorRT EP] support TensorRT 10.8-GA (#23592) d1fb58b Quantization tool: Allow user to override calibrator's session EP (#23559) 649ced4 Enable user loading model with external data from memory buffer (#23557) 544bdd6 Fix ConvTranspose for certain attribute combinations (#23488) 8f6ddf3 Delete extra cgmanifest entries and files (#23583) 5f6a315 Enable VCPKG in CI build (#23426) e1e3f62 Bump lintrunner from 0.12.5 to 0.12.7 (#23326) cd8775f Fix Node JS Samples (#23581) 6b4f9c4 [WebGPU EP] Batch Norm Implementation (#23525) 1fce51b Fix all instances of 4244 and 4267 warnings in OV EP code (#23567) c29ca1c Update QNN default version to 2.31 (#23573) 2fc75a4 [mobile] Add Android BrowserStack test project back (#23551) 9e18b6a [CUDA] Update nvcc flags (#23572) b47e1e6 [QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368) 75a9b40 [ROCm] Update CI to use rocm 6.3.2 (#23577) 26ff2b6 Bump ruff from 0.9.3 to 0.9.4 (#23563) b2560a7 Update react-native to 0.72 (#23509) faee912 [js] update JavaScript API to support QNN EP options (#23486) 816e8cb [EP Perf] Update env to ubuntu 22.04 (#23570) cddc271 Use Eigen in Round implementation (#23571) e8b0bdb Shape inference: ReduceMean dispatcher, quant_pre_process: skip_symbolic_shape bugfix (#23558) 267b493 delete the supported domain version upper bounds (#23237) bb7f961 remove log spam from cpuinfo (#23548) 169917b Use latest vcpkg commit in configuration, sync manifest with deps.txt (#23554) a9d4d08 Add of ReduceMax Gradient (#23501) 6bbf1bd [js/web] upgrade version of flatbuffers (#23545) 271c509 DP4AMatMul perf refinements (#23539) cb69c59 Add fusions for SigLIP and Conformer-Encoder (#23528) 61fae9b Remove "--enable_pybind" from webgpu pipeline (#23550) 0bb4ea6 Update BiasGelu fusion and related ops (#23518) 4dde74a Add more details to BrowserStack script failure (#23520) ead9d5c Set ANDROID_USE_LEGACY_TOOLCHAIN_FILE to false (#23544) 7e24088 Enable dlpack by default (#23110) dc2f7a9 Add overload of `TryParseStringWithClassicLocale()` that uses `std::from_chars()` (#23541) 5407c69 Fix the issue that the new generated EP context model not able to find external data (#23537) fbae88f [js/web] use the recommended workaround for Vite (#23531) d5338da Fix tensor external data info length parsing issue. (#23526) e3e4173 [ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527) 80bc1d2 Enable Ep context with external data for CPU nodes (#23498) bf023ab [js/web] allow import .mjs/.wasm file (#23487) 655a23f [onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342) a770a8d Update RN to 0.71.19 (#23381) 1cf0ebd Delete Prefast workflow until the build failure is fixed (#23510) d2c5e24 Add of GlobalMaxPool Gradient (#23502) ded8730 Remove thrust::unary_function (#23506) 8db97a6 [webgpu] Bump version of Dawn to b9b4a370 (#23494) fdde2e2 Fix for gcc 13.3.1: Avoid creating a copy (#23500) 96ec1dd Bump ruff from 0.9.2 to 0.9.3 (#23496) 42f0c00 Adds the new System.Numerics.Tensors as an input/output type when using dotnet 8.0 and up. (#23261) 97c2bbe Fix shape infer of onnx GroupNorm (#23477) 1fc9c48 Enable coremltools for Linux build (#23481) 13348c5 [ARM CPU] hgemm optimized for gqa (#23107) c89a798 Enable opti on Microsoft.ML.OnnxRuntime with RelWithDebInfo config (#23463) d00ae32 Revert "[Mobile] Add BrowserStack Android MAUI Test (#23383)" (#23474) 8b1d3b3 Align AvgPool ceil_mode on last value to torch (#16752) 06fc73b [TRT EP Perf Tool] Add annotations import to python script to support annotations on Python 3.8 (#23466) ### Motivation and Context This update includes the change to add QNN EP to ARM64X build targets. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Ti-Tai Wang <titaiwang@microsoft.com> Co-authored-by: Caroline Zhu <wolfivyaura@gmail.com> Co-authored-by: Grégoire <gregoire.verdier@gmail.com> Co-authored-by: Jing Fang <126209182+fajin-corp@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Yateng Hong <yatengh@microsoft.com> Co-authored-by: Michael Sharp <51342856+michaelgsharp@users.noreply.github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Malik Shahzad Muzaffar <shahzad.malik.muzaffar@cern.ch> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Corentin Maravat <101636442+cocotdf@users.noreply.github.com> Co-authored-by: Jian Chen <cjian@microsoft.com> Co-authored-by: Karim Vadsariya <karim.vadsariya@microsoft.com> Co-authored-by: Lei Cao <jslhcl@gmail.com> Co-authored-by: Karim Vadsariya <kvadsariya@microsoft.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Hector Li <hecli@microsoft.com> Co-authored-by: Ted Themistokleous <107195283+TedThemistokleous@users.noreply.github.com> Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Takeshi Watanabe <take-cheeze@users.noreply.github.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com> Co-authored-by: Sushanth Rajasankar <44513542+sushraja-msft@users.noreply.github.com> Co-authored-by: PARK DongHa <luncliff@gmail.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: Xinpeng Dou <15529241576@163.com> Co-authored-by: Jambay Kinley <jambaykinley@microsoft.com> Co-authored-by: Yifan Li <109183385+yf711@users.noreply.github.com> Co-authored-by: Gavin Kinsey <98115505+ms-gavinkinsey@users.noreply.github.com> Co-authored-by: Prathik Rao <prathik.rao@gmail.com> Co-authored-by: Jon Campbell <jcampbell@cephable.com> Co-authored-by: Satya Kumar Jandhyala <satya.k.jandhyala@gmail.com> Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com> Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com> Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: TejalKhade28 <tejal.khade@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: Eric Crawford <eric.r.crawford@intel.com> Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: shaoboyan091 <shaoboyan@microsoft.com> Co-authored-by: David Hotham <david.hotham@microsoft.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Enrico Galli <enrico.galli@intel.com>

…un (#23438) The CPU walltime of waiting for PopErrorScope is non-trivial, and also validation errors are not expected to happen in Release build. ### Description  ### Motivation and Context

[webgpu] Only capture validation errors in Debug build

2ec9d76

The CPU walltime of waiting for PopErrorScope is non-trivial, and also validation errors are not expected to happen in Release build.

guschmue added the ep:WebGPU ort-web webgpu provider label Jan 22, 2025

Use pushErrorScope()/popErrorScope() once for an inference run

5139aee

jchen10 changed the title ~~[webgpu] Only capture validation errors in Debug build~~ [webgpu] Use pushErrorScope()/popErrorScope() once for an inference run Feb 4, 2025

fs-eire reviewed Feb 4, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/webgpu_kernel.h Show resolved Hide resolved

Keep the per Op validation in case of full mode

8b0bccb

fs-eire reviewed Feb 7, 2025

View reviewed changes

onnxruntime/core/providers/webgpu/webgpu_context.cc Outdated Show resolved Hide resolved

Keep names

9501966

fs-eire approved these changes Feb 7, 2025

View reviewed changes

fs-eire merged commit 0887e36 into microsoft:main Feb 7, 2025
92 checks passed

ashrit-ms mentioned this pull request Feb 11, 2025

Update win-ort-main to tip main 250211 #23646

Merged

jchen10 deleted the error_scope branch April 2, 2025 13:59

[webgpu] Use pushErrorScope()/popErrorScope() once for an inference run #23438

[webgpu] Use pushErrorScope()/popErrorScope() once for an inference run #23438

Uh oh!

Conversation

jchen10 commented Jan 21, 2025

Description

Motivation and Context

Uh oh!

jchen10 commented Jan 21, 2025

Uh oh!

jchen10 commented Jan 21, 2025

Uh oh!

fs-eire commented Jan 21, 2025

Uh oh!

jchen10 commented Jan 22, 2025

Uh oh!

jchen10 commented Jan 22, 2025

Uh oh!

fs-eire commented Jan 22, 2025

Uh oh!

jchen10 commented Jan 23, 2025

Uh oh!

jchen10 commented Feb 4, 2025

Uh oh!

Uh oh!

Uh oh!

fs-eire commented Feb 7, 2025

Uh oh!

fs-eire commented Feb 7, 2025

Uh oh!

fs-eire commented Feb 7, 2025

Uh oh!

azure-pipelines bot commented Feb 7, 2025

Uh oh!

azure-pipelines bot commented Feb 7, 2025

Uh oh!

azure-pipelines bot commented Feb 7, 2025

Uh oh!

jchen10 commented Feb 7, 2025

Uh oh!

snnn commented Feb 7, 2025

Uh oh!

snnn commented Feb 7, 2025

Uh oh!

azure-pipelines bot commented Feb 7, 2025

Uh oh!

azure-pipelines bot commented Feb 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants