perf(gsplat): scalable per-splat culling for hybrid shadows#8939
Merged
Conversation
The GPU-sort (hybrid) directional-shadow cull now does per-splat culling (opacity, world-space size, and per-splat frustum) instead of coarse per-node only, implemented as a projector-style two pass: - Pass 1 reuses GSplatIntervalCompaction with the light's frustum to produce a dense candidate list (coarse per-node frustum cull + compaction). - Pass 2 is a flat one-thread-per-candidate fine cull (opacity <= alphaClip, max world-space scale below a CPU-precomputed orthographic size threshold, and the 6 light frustum planes), compacting survivors into the per-light draw list. It applies the same vertex-modify as the draw, so cast shadows match the forward pass. The flat fine-cull dispatch keeps full GPU occupancy regardless of how a scene is split into intervals, fixing a pathological case where a few large intervals (e.g. non-octree objects) serialized the cull onto a handful of workgroups. The compaction candidate buffer is now shared, per manager, between the forward hybrid renderer and the shadow cull via a manager-owned GSplatHybridRendererScratch (created only while a GPU-sort renderer is in use). The two paths use it at disjoint points in the frame, so a single allocation suffices, saving one candidate buffer per manager.
Build size reportThis PR changes the size of the minified bundles.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds per-splat culling (opacity, size, frustum) to the GPU-sort (hybrid) directional-shadow path, while sharing its candidate buffer with the forward renderer to save GPU memory.
Changes:
GSplatIntervalCompactionwith the light's frustum) produces a dense candidate list, then a flat one-thread-per-candidate pass applies the per-splat fine tests — opacity (alphaClip), world-space size (a CPU-precomputed orthographic threshold; no per-splat projection), and the 6 light frustum planes — and compacts the survivors. This replaces the previous coarse-only cull and applies the samegsplatModifyVSvertex-modify as the draw, so cast shadows still match the forward pass.GSplatHybridRendererScratch, manager-owned, created only while a GPU-sort renderer is in use). They use it at disjoint points in the frame, so one allocation suffices.