Fix motion vector computation after #17688.#17717
Merged
superdump merged 18 commits intobevyengine:mainfrom Feb 18, 2025
Merged
Fix motion vector computation after #17688.#17717superdump merged 18 commits intobevyengine:mainfrom
superdump merged 18 commits intobevyengine:mainfrom
Conversation
PR bevyengine#17688 broke motion vector computation, and therefore motion blur, because it enabled retention of `MeshInputUniform`s, and `MeshInputUniform`s contain the indices of the previous frame's transform and the previous frame's skinned mesh joint matrices. On frame N, if a `MeshInputUniform` is retained on GPU from the previous frame, the `previous_input_index` and `previous_skin_index` would refer to the indices for frame N - 2, not the index for frame N - 1. This patch fixes the problems. It solves these issues in two different ways, one for transforms and one for skins: 1. To fix transforms, this patch supplies the *frame index* to the shader as part of the view uniforms, and specifies which frame index each mesh's previous transform refers to. So, in the situation described above, the frame index would be N, the previous frame index would be N - 1, and the `previous_input_frame_number` would be N - 2. The shader can now detect this situation and infer that the mesh has been retained, and can therefore conclude that the mesh's transform hasn't changed. 2. To fix skins, this patch replaces the explicit `previous_skin_index` with an invariant that the index of the joints for the current frame and the index of the joints for the previous frame are the same. This means that the `MeshInputUniform` never has to be updated even if the skin is animated. The downside is that we have to copy joint matrices from the previous frame's buffer to the current frame's buffer in `extract_skins`. The rationale behind (2) is that we currently have no mechanism to detect when joints that affect a skin have been updated, short of comparing all the transforms and setting a flag for `extract_meshes_for_gpu_building` to consume, which would regress performance as we want `extract_skins` and `extract_meshes_for_gpu_building` to be able to run in parallel. To test this change, use `cargo run --example motion_blur`.
tychedelia
approved these changes
Feb 7, 2025
Member
tychedelia
left a comment
There was a problem hiding this comment.
Makes sense. Tested on macOS.
IceSentry
reviewed
Feb 8, 2025
Member
aevyrie
approved these changes
Feb 11, 2025
Member
aevyrie
left a comment
There was a problem hiding this comment.
Issues appear to be resolved in latest commit.
superdump
reviewed
Feb 11, 2025
| /// Low 16 bits: index of the material inside the bind group data. | ||
| /// High 16 bits: index of the lightmap in the binding array. | ||
| pub material_and_lightmap_bind_group_slot: u32, | ||
| pub timestamp: u32, |
Contributor
There was a problem hiding this comment.
Why call it timestamp if it's a frame count?
Contributor
There was a problem hiding this comment.
Maybe frame_count_when_updated or something like that? That seems to be the semantics of it.
superdump
reviewed
Feb 11, 2025
crates/bevy_pbr/src/render/skin.rs
Outdated
| } | ||
| } | ||
|
|
||
| pub fn mark_meshes_as_changed_if_their_skins_changed( |
Contributor
There was a problem hiding this comment.
For consistency:
Suggested change
| pub fn mark_meshes_as_changed_if_their_skins_changed( | |
| pub fn mark_3d_meshes_as_changed_if_their_skins_changed( |
It would be nice to port all this to 2D as well... :)
superdump
approved these changes
Feb 11, 2025
Contributor
superdump
left a comment
There was a problem hiding this comment.
Just a minor variable naming thing for clarity. Otherwise LGTM.
Currently, Bevy rebuilds the buffer containing all the transforms for joints every frame, during the extraction phase. This is inefficient in cases in which many skins are present in the scene and their joints don't move, such as the Caldera test scene. To address this problem, this commit switches skin extraction to use a set of retained GPU buffers with allocations managed by the offset allocator. I use fine-grained change detection in order to determine which skins need updating. Note that the granularity is on the level of an entire skin, not individual joints. Using the change detection at that level would yield poor performance in common cases in which an entire skin is animated at once. Also, this patch yields additional performance from the fact that changing joint transforms no longer requires the skinned mesh to be re-extracted. Note that this optimization can be a double-edged sword. In `many_foxes`, fine-grained change detection regressed the performance of `extract_skins` by 3.4x. This is because every joint is updated every frame in that example, so change detection is pointless and is pure overhead. Because the `many_foxes` workload is actually representative of animated scenes, this patch includes a heuristic that disables fine-grained change detection if the number of transformed entities in the frame exceeds a certain fraction of the total number of joints. Currently, this threshold is set to 25%. Note that this is a crude heuristic, because it doesn't distinguish between the number of transformed *joints* and the number of transformed *entities*; however, it should be good enough to yield the optimum code path most of the time. Finally, this patch fixes a bug whereby skinned meshes are actually being incorrectly retained if the buffer offsets of the joints of those skinned meshes changes from frame to frame. To fix this without retaining skins, we would have to re-extract every skinned mesh every frame. Doing this was a significant regression on Caldera. With this PR, by contrast, mesh joints stay at the same buffer offset, so we don't have to update the `MeshInputUniform` containing the buffer offset every frame. This also makes PR bevyengine#17717 easier to implement, because that PR uses the buffer offset from the previous frame, and the logic for calculating that is simplified if the previous frame's buffer offset is guaranteed to be identical to that of the current frame. On Caldera, this patch reduces the time spent in `extract_skins` from 1.79 ms to near zero. On `many_foxes`, this patch regresses the performance of `extract_skins` by approximately 10%-25%, depending on the number of foxes. This has only a small impact on frame rate.
Contributor
Author
|
I'd like #17818 to land first as that will simplify things. |
github-merge-queue bot
pushed a commit
that referenced
this pull request
Feb 18, 2025
Currently, Bevy rebuilds the buffer containing all the transforms for joints every frame, during the extraction phase. This is inefficient in cases in which many skins are present in the scene and their joints don't move, such as the Caldera test scene. To address this problem, this commit switches skin extraction to use a set of retained GPU buffers with allocations managed by the offset allocator. I use fine-grained change detection in order to determine which skins need updating. Note that the granularity is on the level of an entire skin, not individual joints. Using the change detection at that level would yield poor performance in common cases in which an entire skin is animated at once. Also, this patch yields additional performance from the fact that changing joint transforms no longer requires the skinned mesh to be re-extracted. Note that this optimization can be a double-edged sword. In `many_foxes`, fine-grained change detection regressed the performance of `extract_skins` by 3.4x. This is because every joint is updated every frame in that example, so change detection is pointless and is pure overhead. Because the `many_foxes` workload is actually representative of animated scenes, this patch includes a heuristic that disables fine-grained change detection if the number of transformed entities in the frame exceeds a certain fraction of the total number of joints. Currently, this threshold is set to 25%. Note that this is a crude heuristic, because it doesn't distinguish between the number of transformed *joints* and the number of transformed *entities*; however, it should be good enough to yield the optimum code path most of the time. Finally, this patch fixes a bug whereby skinned meshes are actually being incorrectly retained if the buffer offsets of the joints of those skinned meshes changes from frame to frame. To fix this without retaining skins, we would have to re-extract every skinned mesh every frame. Doing this was a significant regression on Caldera. With this PR, by contrast, mesh joints stay at the same buffer offset, so we don't have to update the `MeshInputUniform` containing the buffer offset every frame. This also makes PR #17717 easier to implement, because that PR uses the buffer offset from the previous frame, and the logic for calculating that is simplified if the previous frame's buffer offset is guaranteed to be identical to that of the current frame. On Caldera, this patch reduces the time spent in `extract_skins` from 1.79 ms to near zero. On `many_foxes`, this patch regresses the performance of `extract_skins` by approximately 10%-25%, depending on the number of foxes. This has only a small impact on frame rate.
Contributor
Author
|
This should be ready to go assuming CI is green. |
superdump
approved these changes
Feb 18, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.




PR #17688 broke motion vector computation, and therefore motion blur, because it enabled retention of
MeshInputUniforms, andMeshInputUniforms contain the indices of the previous frame's transform and the previous frame's skinned mesh joint matrices. On frame N, if aMeshInputUniformis retained on GPU from the previous frame, theprevious_input_indexandprevious_skin_indexwould refer to the indices for frame N - 2, not the index for frame N - 1.This patch fixes the problems. It solves these issues in two different ways, one for transforms and one for skins:
To fix transforms, this patch supplies the frame index to the shader as part of the view uniforms, and specifies which frame index each mesh's previous transform refers to. So, in the situation described above, the frame index would be N, the previous frame index would be N - 1, and the
previous_input_frame_numberwould be N - 2. The shader can now detect this situation and infer that the mesh has been retained, and can therefore conclude that the mesh's transform hasn't changed.To fix skins, this patch replaces the explicit
previous_skin_indexwith an invariant that the index of the joints for the current frame and the index of the joints for the previous frame are the same. This means that theMeshInputUniformnever has to be updated even if the skin is animated. The downside is that we have to copy joint matrices from the previous frame's buffer to the current frame's buffer inextract_skins.The rationale behind (2) is that we currently have no mechanism to detect when joints that affect a skin have been updated, short of comparing all the transforms and setting a flag for
extract_meshes_for_gpu_buildingto consume, which would regress performance as we wantextract_skinsandextract_meshes_for_gpu_buildingto be able to run in parallel.To test this change, use
cargo run --example motion_blur.