perf: reduce radix sort GPU buffer footprint#8706
Merged
Merged
Conversation
- Drop dedicated sortedIndices buffer; derive output from values ping-pong - Add destructiveKeys option to borrow caller keys buffer as second key slot - Enable destructiveKeys for GSplat hybrid indirect sort (sortKeys overwritten each frame)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reduces WebGPU storage buffer usage for
ComputeRadixSortby eliminating two N×4-byte allocations on typical paths.Changes:
_sortedIndicesbuffer: sorted values live in the values ping-pong pair;sortedIndicesis derived from pass parity (same idea assortedKeys).destructiveKeys: whentrue, the sorter does not allocate_keys1; it reuses the callerkeysBufferas the second key ping-pong slot after pass 0. Documented onsort/sortIndirect; default remainsfalsefor safety.gsplat-managerpassesdestructiveKeys: trueforsortIndirectbecauseprojector.sortKeysis overwritten every frame before the sort.API (additive):
ComputeRadixSort.sort(..., skipLastPassKeyWrite, destructiveKeys)— new last optional boolean, defaultfalse.ComputeRadixSort.sortIndirect(..., skipLastPassKeyWrite, destructiveKeys)— same.Performance: ~2 × N × 4 bytes less VRAM for callers that opt into
destructiveKeysand always use the new output path (e.g. hybrid GSplat at ~4M splats saves on the order of tens of MB).