Skip to content

Rewrite the multidrawable batch set builder for performance.#17923

Merged
superdump merged 1 commit intobevyengine:mainfrom
pcwalton:rewrite-batch-and-prepare-binned-render-phase
Feb 20, 2025
Merged

Rewrite the multidrawable batch set builder for performance.#17923
superdump merged 1 commit intobevyengine:mainfrom
pcwalton:rewrite-batch-and-prepare-binned-render-phase

Conversation

@pcwalton
Copy link
Copy Markdown
Contributor

This commit restructures the multidrawable batch set builder for better performance in various ways:

  • The bin traversal is optimized to make the best use of the CPU cache.

  • The inner loop that iterates over the bins, which is the hottest part of batch_and_prepare_binned_render_phase, has been shrunk as small as possible.

  • Where possible, multiple elements are added to or reserved from GPU buffers as a batch instead of one at a time.

  • Methods that LLVM wasn't inlining have been marked #[inline] where doing so would unlock optimizations.

This code has also been refactored to avoid duplication between the logic for indexed and non-indexed meshes via the introduction of a MultidrawableBatchSetPreparer object.

Together, this improved the batch_and_prepare_binned_render_phase time on Caldera by approximately 2×.

Eventually, we should optimize the batchable-but-not-multidrawable and unbatchable logic as well, but these meshes are much rarer, so in the interests of keeping this patch relatively small I opted to leave those to a follow-up.

This commit restructures the multidrawable batch set builder for better
performance in various ways:

* The bin traversal is optimized to make the best use of the CPU cache.

* The inner loop that iterates over the bins, which is the hottest part
  of `batch_and_prepare_binned_render_phase`, has been shrunk as small
  as possible.

* Where possible, multiple elements are added to or reserved from GPU
  buffers as a batch instead of one at a time.

* Methods that LLVM wasn't inlining have been marked `#[inline]` where
  doing so would unlock optimizations.

This code has also been refactored to avoid duplication between the
logic for indexed and non-indexed meshes via the introduction of a
`MultidrawableBatchSetPreparer` object.

Together, this improved the `batch_and_prepare_binned_render_phase` time
on Caldera by approximately 2×.

Eventually, we should optimize the batchable-but-not-multidrawable and
unbatchable logic as well, but these meshes are much rarer, so in the
interests of keeping this patch relatively small I opted to leave those
to a follow-up.
@pcwalton pcwalton added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 18, 2025
@pcwalton
Copy link
Copy Markdown
Contributor Author

Example runner looks good.

Copy link
Copy Markdown
Member

@tychedelia tychedelia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, the comments in the new implementation are really nice.

@tychedelia tychedelia added S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it and removed S-Needs-Review Needs reviewer attention (from anyone!) to move forward labels Feb 19, 2025
@superdump superdump added this pull request to the merge queue Feb 20, 2025
Merged via the queue into bevyengine:main with commit f15437e Feb 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants