Fix overestimated WebGPU GPU frame timing in profiler#8903
Merged
Conversation
Report frame GPU time as the span between the earliest begin and latest end timestamp across all passes, instead of summing per-pass durations. On tile-based / pipelined GPUs (e.g. Apple Silicon) consecutive passes overlap in time, so summing double-counts the overlap and grows with the number of passes even while the GPU is mostly idle. Also clamp per-pass durations to be non-negative to fix occasional negative readings from out-of-order begin/end timestamps, and fix a debug-label bug on the query staging buffer. WebGL profiling is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The GPU time reported by the WebGPU profiler (and shown in mini-stats) was massively overestimated, growing with the number of render/compute passes even when the GPU was mostly idle. On tile-based / pipelined GPUs (Apple Silicon is a well known example) consecutive passes overlap in time — the vertex stage of one pass runs concurrently with the fragment stage of the previous one — so summing per-pass timestamp durations double-counts the overlap.
This reports the frame GPU time as the span between the earliest begin and latest end timestamp across all passes, which stays within the real frame time. Per-pass timings are kept as-is (useful for relative comparison) but are now clamped to be non-negative, fixing the negative readings some users observed.
Changes:
Closes #8350