Rewrite the multidrawable batch set builder for performance.#17923
Merged
superdump merged 1 commit intobevyengine:mainfrom Feb 20, 2025
Merged
Conversation
This commit restructures the multidrawable batch set builder for better performance in various ways: * The bin traversal is optimized to make the best use of the CPU cache. * The inner loop that iterates over the bins, which is the hottest part of `batch_and_prepare_binned_render_phase`, has been shrunk as small as possible. * Where possible, multiple elements are added to or reserved from GPU buffers as a batch instead of one at a time. * Methods that LLVM wasn't inlining have been marked `#[inline]` where doing so would unlock optimizations. This code has also been refactored to avoid duplication between the logic for indexed and non-indexed meshes via the introduction of a `MultidrawableBatchSetPreparer` object. Together, this improved the `batch_and_prepare_binned_render_phase` time on Caldera by approximately 2×. Eventually, we should optimize the batchable-but-not-multidrawable and unbatchable logic as well, but these meshes are much rarer, so in the interests of keeping this patch relatively small I opted to leave those to a follow-up.
Contributor
Author
|
Example runner looks good. |
IceSentry
approved these changes
Feb 19, 2025
tychedelia
approved these changes
Feb 19, 2025
Member
tychedelia
left a comment
There was a problem hiding this comment.
Thanks, the comments in the new implementation are really nice.
superdump
approved these changes
Feb 20, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This commit restructures the multidrawable batch set builder for better performance in various ways:
The bin traversal is optimized to make the best use of the CPU cache.
The inner loop that iterates over the bins, which is the hottest part of
batch_and_prepare_binned_render_phase, has been shrunk as small as possible.Where possible, multiple elements are added to or reserved from GPU buffers as a batch instead of one at a time.
Methods that LLVM wasn't inlining have been marked
#[inline]where doing so would unlock optimizations.This code has also been refactored to avoid duplication between the logic for indexed and non-indexed meshes via the introduction of a
MultidrawableBatchSetPreparerobject.Together, this improved the
batch_and_prepare_binned_render_phasetime on Caldera by approximately 2×.Eventually, we should optimize the batchable-but-not-multidrawable and unbatchable logic as well, but these meshes are much rarer, so in the interests of keeping this patch relatively small I opted to leave those to a follow-up.