Batch skinned meshes on platforms where storage buffers are available.#16599
Batch skinned meshes on platforms where storage buffers are available.#16599alice-i-cecile merged 15 commits intobevyengine:mainfrom
Conversation
This commit makes skinned meshes batchable on platforms other than WebGL 2. On such platforms, it replaces the two uniform buffers used for joint matrices with a pair of storage buffers containing all matrices for all skinned meshes concatenated together. The indices into the buffer are stored in the mesh uniform and mesh input uniform. The GPU mesh preprocessing step copies the indices in if that step is enabled. On the `many_foxes` demo, I observed a frame time decrease from 15.470ms to 11.935ms. This is the result of reducing the `submit_graph_commands` time from an average of 5.45ms to 0.489ms, an 11x speedup in that portion of rendering.
|
I'm not sure if this is working? I ran many_foxes through renderdoc, and I still see hundreds of vkCmdDraws, each with two vkCmdBindDescriptorSets before it. One for StandardMaterial, one for skinned_mesh_bind_group. I would've expected only one vkCmdBindDescriptorSets for skinned_mesh_bind_group, and then since the meshes are the same we don't have to rebind StandardMaterial either, and then you would be able to collapse all the draws down to one draw with instance_count=N. Am I missing why this doesn't seem to reduce commands? |
|
Windows, Intel 155h. |
|
It works on my M2 mac mini too. Hmm. |
kristoff3r
left a comment
There was a problem hiding this comment.
I'm not super familiar with the skinning system, but the code looks consistent and I tested many_foxes on both Linux+webgl2 and both works.
|
Running into a panic with this on the backtrace |
|
Looks like motion vectors are also broken with this (tested by adding the backtrace |
|
I fixed the motion vectors issue, but was unable to reproduce the |
I can't reproduce the Motion vectors no longer crash but they also don't appear to be working for skinned meshes now. Motion vector prepass texture once everything is rendered (left main, right this PR): |
|
This seems fine to merge, but I'm doing a manual example run here to check for regressions. Ping me once that's done / tomorrow if I forget. |
|
Ok, the skinned mesh motion vector thing should be fixed now. |
|
I believe the CI failure is #15981. |
|
Example run is green. |
bevyengine#16599) This commit makes skinned meshes batchable on platforms other than WebGL 2. On supported platforms, it replaces the two uniform buffers used for joint matrices with a pair of storage buffers containing all matrices for all skinned meshes packed together. The indices into the buffer are stored in the mesh uniform and mesh input uniform. The GPU mesh preprocessing step copies the indices in if that step is enabled. On the `many_foxes` demo, I observed a frame time decrease from 15.470ms to 11.935ms. This is the result of reducing the `submit_graph_commands` time from an average of 5.45ms to 0.489ms, an 11x speedup in that portion of rendering.  This is what the profile looks like for `many_foxes` after these changes.  --------- Co-authored-by: François Mockers <mockersf@gmail.com>
bevyengine#16599) This commit makes skinned meshes batchable on platforms other than WebGL 2. On supported platforms, it replaces the two uniform buffers used for joint matrices with a pair of storage buffers containing all matrices for all skinned meshes packed together. The indices into the buffer are stored in the mesh uniform and mesh input uniform. The GPU mesh preprocessing step copies the indices in if that step is enabled. On the `many_foxes` demo, I observed a frame time decrease from 15.470ms to 11.935ms. This is the result of reducing the `submit_graph_commands` time from an average of 5.45ms to 0.489ms, an 11x speedup in that portion of rendering.  This is what the profile looks like for `many_foxes` after these changes.  --------- Co-authored-by: François Mockers <mockersf@gmail.com>





This commit makes skinned meshes batchable on platforms other than WebGL 2. On supported platforms, it replaces the two uniform buffers used for joint matrices with a pair of storage buffers containing all matrices for all skinned meshes packed together. The indices into the buffer are stored in the mesh uniform and mesh input uniform. The GPU mesh preprocessing step copies the indices in if that step is enabled.
On the
many_foxesdemo, I observed a frame time decrease from 15.470ms to 11.935ms. This is the result of reducing thesubmit_graph_commandstime from an average of 5.45ms to 0.489ms, an 11x speedup in that portion of rendering.This is what the profile looks like for
many_foxesafter these changes.