Use GpuArrayBuffer for MeshUniform#9254
Conversation
574516f to
382737d
Compare
|
Example |
1 similar comment
|
Example |
| pub(super) fn model(render_device: &RenderDevice, binding: u32) -> BindGroupLayoutEntry { | ||
| GpuArrayBuffer::<MeshUniform>::binding_layout( | ||
| binding, | ||
| ShaderStages::VERTEX_FRAGMENT, | ||
| render_device, | ||
| ) |
There was a problem hiding this comment.
I guess skinning and weights can get the same treatment in a future PR right? Since they could use a storage buffer in lieu of a uniform buffer when available.
There was a problem hiding this comment.
Let me reformulate: we already use an index and an Uniform for skins and morphs, but is there any benefit to use a storage buffer when available?
There was a problem hiding this comment.
We can pack all the skinning data into 1 storage buffer, instead of multiple uniform buffers. Less buffers = less rebinds = more performance :)
There was a problem hiding this comment.
Right, it applies to all buffers of struct data, basically. One needs to plumb in the indices, so perhaps a per-object index has an index or offset into a skinning array or so. I don't know too much about those skins work to know how best to structure the data. Another aspect of having a single buffer is if the data itself repeats with varying counts of things (so things like vertices in vertex buffers, or maybe the number of transforms in a skin?) then some kind of allocator is usually used to manage loading in/out data without having to reallocate buffers all the time and rewrite everything, and handle things of varying sizes.
| GpuArrayBuffer::<MeshUniform>::binding_layout( | ||
| binding, | ||
| ShaderStages::VERTEX_FRAGMENT, | ||
| render_device, |
There was a problem hiding this comment.
In a future PR, it might be worthwhile to newtype device.limits().max_storage_buffers_per_shader_stage, and accept that newtype instead of RenderDevice as argument to GpuArrayBuffer so that it's clear what data it is using.
There was a problem hiding this comment.
Mmm. I thought about passing the values explicitly, but then I also thought that maybe other features or limits may impact logic at some point. If it had been more difficult to plumb it in then I would have changed it.
|
Looks good. (I'll do some testing on my end before dropping an approval) What is the "TODO" item supposed to mean? Do you intend to iterate in future PRs or within this one? |
I'll probably do it in this one. |
|
Reviewed the PR in it's current state, minus the existing feedback lgtm :) Also good catch on GpuComponentArrayBufferPlugin not using finish(). |
…mic offset When using a uniform buffer with batches of per-object data, this provides a ~18% frame time reduction on the many_cubes -- sphere example in the Opaque3d phase and should benefit alpha mask, prepass, and shadow phases in a similar way.
|
Example |
|
Ideally we'd have a good way of benchmarking the impact on prepass and shadows. I may modify the many_cubes example to also support some modes to test the performance of these. |
The RenderApp sub app does not exist when the renderer is disabled so assuming it does is not a good idea.
|
Example |
| pbr_input.occlusion = occlusion; | ||
|
|
||
| pbr_input.flags = mesh.flags; | ||
| pbr_input.flags = mesh[in.instance_index].flags; |
There was a problem hiding this comment.
this is using in.instance_index unconditionally, but it's existence depends on VERTEX_OUTPUT_INSTANCE_INDEX. i think that's probably fine since this frag shader is (currently) specific to standard materials, but it doesn't feel great.
as far as i can tell this is the only use of the instance index in any fragment stage. would it make more sense to push the mesh flags through the MeshVertexOutput struct directly (and unconditionally)? then we could get rid of the VERTEX_OUTPUT_INSTANCE_INDEX entirely.
this would not make sense if there are other potential uses for mesh data in the fragment stage but i think there shoudn't be generally.
There was a problem hiding this comment.
I think that would add cost to the vertex stage / interpolators but I don't know if it's worth the tradeoff.
After this change, people will need to be aware of per-object data and how to access it. Long-term, various indices will be added into the per-object data for materials, animations, and probably more. At least material stuff I expect to be used in fragment stage. The instance index, or some other way of getting the per-object index into the shader, would be used to index into the per-object data array, and then within that type there would be a material index to index into a material data array.
| // NOTE: (dynamic offset, -distance) | ||
| // NOTE: Values increase towards the camera. Front-to-back ordering for opaque means we need a descending sort. | ||
| type SortKey = Reverse<FloatOrd>; | ||
| type SortKey = (u32, Reverse<FloatOrd>); |
There was a problem hiding this comment.
probably not for this pr but just wondering, is there any possibility for / value in some kind of spatial sorting before putting the mesh data in the gpu array? i think currently they are entity iterator sorted, but it seems like (at least for static meshes) there'd be some benefit to having a batch be as proximally local as possible and then to sort batches based on the nearest member, or something like that.
There was a problem hiding this comment.
That will happen when the render set reordering and batching is implemented. The reason for reordering the render sets to extract, prepare assets, queue, sort, prepare+batch, render is to allow the order of draws to be known when preparing the data so that the order can be taken into account.
392408d to
4c8fba0
Compare
4c8fba0 to
dae52b7
Compare
|
This seems to be ignoring object transforms when rendering stuff, as everything seems to be rendering in the same place (tested 3d_scene: cube is lower than it should be, 3d_shapes: everything is mashed together instead of being separate, and scene_viewer (with a large custom scene): everything is mashed together, and the object clump jumps around weirdly when moving the camera around.) edit: Works properly on vulkan, broken on dx12. |
|
I'm getting a 45% speedup on vulkan/linux with your latest set of changes for I'm unable to test the OpenGL backend as I get a panic with |
|
@nicopap nice! @Elabajaba - oof, ok, I guess this isn't mergeable if it breaks DX12. Could anyone test DX12 on NVIDIA as I think @Elabajaba uses AMD. I will test on a mobile RTX 3080 as soon as I can. |
looks like always index 0 on nvidia/dx12 as well |
|
The vulkan gains seem big enough that it might be worth forcing dx12 to always draw instead of draw_indexed for now, and try and get it implemented in wgpu for their next release? edit: Not sure how many people are actually using bevy dx12, or what any potential performance losses might be (or if forcing draw instead of draw_indexed would break stuff). |
|
I'm satisfied with the performance benefit. The blocking issue is DX12 instance_index being broken. |
…d of vertex.instance_index
Dx12 instance_index bugfix
|
The DX12 issue was fixed by @Elabajaba - thanks for figuring that one out! |
# Objective - Fix shader_material_glsl example ## Solution - Expose the `PER_OBJECT_BUFFER_BATCH_SIZE` shader def through the default `MeshPipeline` specialization. - Make use of it in the `custom_material.vert` shader to access the mesh binding. --- ## Changelog - Added: Exposed the `PER_OBJECT_BUFFER_BATCH_SIZE` shader def through the default `MeshPipeline` specialization to use in custom shaders not using bevy_pbr::mesh_bindings that still want to use the mesh binding in some way.




















Objective
Solution
GpuArrayBufferforMeshUniformdata to store allMeshUniformdata in arrays within fewer bindingsChangelog
MeshUniformdata is now managed byGpuArrayBufferas arrays in buffers that need to be indexed into.Migration Guide
Accessing the
modelmember of an individual mesh object's shaderMeshstruct the old way where eachMeshUniformwas stored at its own dynamic offset:The new way where one needs to index into the array of
Meshes for the batch:Note that using the instance_index is the default way to pass the per-object index into the shader, but if you wish to do custom rendering approaches you can pass it in however you like.