Draft
Conversation
superdump
reviewed
Mar 15, 2024
crates/bevy_pbr/src/prepass/mod.rs
Outdated
| batch_and_prepare_render_phase::<Opaque3dPrepass, MeshPipeline>, | ||
| batch_and_prepare_render_phase::<AlphaMask3dPrepass, MeshPipeline>, | ||
| ) | ||
| .after(allocate_batch_buffer::<MeshPipeline>), |
Contributor
There was a problem hiding this comment.
This starts to look like we maybe want a batching plugin that is generic over render phase item and the base specialisation pipeline or something.
Member
Author
There was a problem hiding this comment.
Agreed there. This is getting unwieldy to manage.
pcwalton
added a commit
to pcwalton/bevy
that referenced
this pull request
Mar 20, 2024
Today, when we upload data to a `StorageBuffer`, it must be copied twice: once from `StorageBuffer::value` to `StorageBuffer::scratch`, and once from `StorageBuffer::scratch` to the GPU. This patch eliminates the former copy in favor of storing the single canonical copy of the data directly inside the `scratch` field. This patch complements bevyengine#12489, which eliminates both copies for mesh data at the cost of requiring more overhead for GPU I/O, especially for single-threaded situations. Simply eliminating one of the copies seems an unambiguous win (and is lower-hanging fruit), so both this patch and PR bevyengine#12489 are valuable.
19 tasks
Co-Authored By: robert.swain@gmail.com
Member
|
Triage: has merge conflicts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Objective
Improve CPU-side rendering performance.
Solution
Create
BufferPoolvariants ofDynamicBuffer,StorageBuffer,BatchedUniformBuffer, andGpuArrayBuffer. They do not have a system RAM side buffer of any kind, but rely onQueue::write_buffer_withto directly write values into a staging buffer. AsQueue::write_bufferwith operates with a&Queue, it's possible to parallelize down to a view level when batching.The downside is that these buffers are not resizable after being mapped, so these types must reserve fixed sized slices from the buffer ahead of time. The data flow runs as follows:
clear_batch_bufferclears the buffer pool.reserve_batch_buffermutably grabs the buffer pool and reserves a range for everyRenderPhase<T>and saves the reserved in theRenderPhase. This is a very fast O(1) operation that does not require allocation or IO of any kind. These must run sequentially due to needing to grab the pool and render phases in parallel.allocate_batch_bufferallocates the actual GPU-side buffer.batch_and_prepare_render_phasethen runs on each of them in parallel and parallelizes individual views withQuery::par_iter_mut.NOTE: I'm likely getting something wrong with the indices, which is causing mesh draw calls to be mismatched, causing them to be rendered randomly, which can be potentially seizure inducing depending on the scene. Please be aware of this while testing this change.
Performance
Tested against
many_foxes, this nets a rough 4% improvement in render schedule timings.TODO: Test this against heavier scenes.
Changelog
TODO
Migration Guide
TODO