Skip to content

Fix GPU decimation on large scenes and stop swallowing GPU failures#254

Merged
slimbuck merged 7 commits into
playcanvas:mainfrom
slimbuck:decimate-dev
Jun 4, 2026
Merged

Fix GPU decimation on large scenes and stop swallowing GPU failures#254
slimbuck merged 7 commits into
playcanvas:mainfrom
slimbuck:decimate-dev

Conversation

@slimbuck

@slimbuck slimbuck commented Jun 4, 2026

Copy link
Copy Markdown
Member

Fixes two related GPU-decimation problems on large scenes, plus a fail-loud robustness change.
Problems

  • The edge-cost kernel packed all spherical-harmonic coefficients into a single storage buffer, which exceeds WebGPU's ~2GB per-binding limit (maxStorageBufferBindingSize) at ~11.2M splats with 48 SH columns — decimation crashed with appearance buffer … exceeds device maxStorageBufferBindingSize.
  • A device out-of-memory (hit on an EC2 NVIDIA instance) was swallowed: WebGPU reports OOM asynchronously, so the failed allocation left buffers zeroed, the k-NN graph came back empty, and simplifyGaussians silently returned its input. In the streamed-SOG LOD chain (each level decimated from the previous) this cascaded into several identical full-resolution levels with a clean exit code.
    Changes
  • Split the appearance buffer into up to three ≤16-column chunks (variable width, so partial chunks store no padding) and merge positions+scalars into one buffer, keeping the kernel at 8 storage buffers (the WebGPU baseline). Raises the per-binding cap from ~11.2M to ~33.5M splats at 48 SH columns, and ~59.6M for DC-only inputs.
  • Upload edges per dispatch batch (256KB) instead of the full N·k list, removing them as a per-binding cap and cutting ~1.6GB of VRAM at 13M splats.
  • Centralize GPU error handling in the device factory (uncapturederror + device-lost): log the precise cause and escalate to a non-zero exit, so every GPU consumer fails loudly instead of writing degenerate output.
  • simplifyGaussians now throws if it cannot reach the target (zero k-NN edges or no finite-cost pairs) at any iteration, instead of returning a partial or full-resolution scene.
    Verification
  • GPU output is byte-identical to the previous version on real scenes (910K and 3.3M splats); the chunking and edge-batching changes are behavior-preserving.
  • CPU decimation suite passes (41/41); the no-progress guard throws on degenerate input and the centralized handler logs + fails on a forced GPU error.
  • Could not reproduce a real device OOM locally (ample memory), so the OOM path is verified via the error-scope/handler mechanism and the no-progress backstop rather than an actual OOM — re-running the failing .splat on the EC2 instance is the end-to-end confirmation.

@slimbuck slimbuck requested a review from Copilot June 4, 2026 10:53
@slimbuck slimbuck self-assigned this Jun 4, 2026
@slimbuck slimbuck added the enhancement New feature or request label Jun 4, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses GPU decimation failures on large scenes by restructuring GPU-side data layouts to avoid WebGPU per-binding size limits, batching edge uploads to reduce memory pressure, and making GPU failures fail-loud instead of silently producing degenerate output.

Changes:

  • Split appearance (SH) coefficients into up to three ≤16-column storage buffers and merge positions+scalars into a single packed buffer for the edge-cost kernel.
  • Batch edge uploads per dispatch (instead of binding the full N·k edge list) to reduce VRAM usage and remove edges as a per-binding size cap.
  • Centralize GPU error escalation in the Node WebGPU device factory, and make simplifyGaussians throw when decimation cannot make progress (no edges / no finite-cost pairs).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/lib/gpu/gpu-edge-cost.ts Updates WGSL + buffer/bindings to use packed geometry, chunked appearance buffers, and batch-wise edge uploads for large-scene robustness.
src/lib/data-table/decimate.ts Updates GPU packing to match the new kernel layout and adds fail-loud guards when GPU decimation produces no usable work.
src/cli/node-device.ts Adds centralized WebGPU uncapturederror / device-lost escalation to avoid silently continuing after GPU failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib/gpu/gpu-edge-cost.ts Outdated
Comment thread src/lib/data-table/decimate.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/lib/gpu/gpu-edge-cost.ts
Comment thread src/lib/data-table/decimate.ts
Comment thread src/lib/data-table/decimate.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread src/lib/gpu/gpu-edge-cost.ts
Comment thread src/lib/gpu/gpu-edge-cost.ts
Comment thread src/lib/data-table/decimate.ts
Comment thread src/lib/data-table/decimate.ts
@slimbuck slimbuck marked this pull request as ready for review June 4, 2026 11:53
@slimbuck slimbuck requested a review from a team June 4, 2026 11:53
@slimbuck slimbuck merged commit bffaccf into playcanvas:main Jun 4, 2026
3 checks passed
@slimbuck slimbuck deleted the decimate-dev branch June 4, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants