Skip to content

perf: reduce radix sort GPU buffer footprint#8706

Merged
mvaligursky merged 1 commit into
mainfrom
mv-radix-sort-buffer-savings
May 8, 2026
Merged

perf: reduce radix sort GPU buffer footprint#8706
mvaligursky merged 1 commit into
mainfrom
mv-radix-sort-buffer-savings

Conversation

@mvaligursky

Copy link
Copy Markdown
Contributor

Reduces WebGPU storage buffer usage for ComputeRadixSort by eliminating two N×4-byte allocations on typical paths.

Changes:

  • No dedicated _sortedIndices buffer: sorted values live in the values ping-pong pair; sortedIndices is derived from pass parity (same idea as sortedKeys).
  • Optional destructiveKeys: when true, the sorter does not allocate _keys1; it reuses the caller keysBuffer as the second key ping-pong slot after pass 0. Documented on sort / sortIndirect; default remains false for safety.
  • GSplat hybrid: gsplat-manager passes destructiveKeys: true for sortIndirect because projector.sortKeys is overwritten every frame before the sort.

API (additive):

  • ComputeRadixSort.sort(..., skipLastPassKeyWrite, destructiveKeys) — new last optional boolean, default false.
  • ComputeRadixSort.sortIndirect(..., skipLastPassKeyWrite, destructiveKeys) — same.

Performance: ~2 × N × 4 bytes less VRAM for callers that opt into destructiveKeys and always use the new output path (e.g. hybrid GSplat at ~4M splats saves on the order of tens of MB).

- Drop dedicated sortedIndices buffer; derive output from values ping-pong
- Add destructiveKeys option to borrow caller keys buffer as second key slot
- Enable destructiveKeys for GSplat hybrid indirect sort (sortKeys overwritten each frame)
@mvaligursky mvaligursky self-assigned this May 8, 2026
@mvaligursky mvaligursky merged commit 5bbd7c2 into main May 8, 2026
8 checks passed
@mvaligursky mvaligursky deleted the mv-radix-sort-buffer-savings branch May 8, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant