Fix multi-resolution rendering: key+value sort, sub-frame split, image-relative fade by slimbuck · Pull Request #243 · playcanvas/splat-transform

slimbuck · 2026-05-21T13:21:33Z

Summary

Renderer changes to make output correct and consistent across all resolutions. The pre-fix code worked at 1080p but silently broke at 1920×1200, 1440p, 4K, and 8K — and even where it produced output, the same world-space scene rendered very differently at different resolutions.

What was broken

Silent key overflow above 8192 tiles. The pair-sort key packed (tileIdx << 19) | splatIdx into a single u32. At any image with more than 8192 total tiles (≥ 1920×1200), tileIdx << 19 overflowed u32 and aliased tile indices — silent corruption with no error.
Hard tile-edge artifacts at higher resolutions. Splat pixel radius scales with image height (focal length × world-radius / depth). At 8K a splat that fit comfortably in the per-splat tile-coverage cap at 1080p projects to 16× more tiles. The cap was previously enforced by clamping coverage and dropping tail tiles, producing visible hard square edges throughout the image.
Resolution-dependent radius fade. RADIUS_FADE_START_PX = 1024 / RADIUS_FADE_END_PX = 2048 were absolute pixel values. World-space splats whose 1080p pixel radius was 256-512 px (well below 1080p's 1024-px fade-start) projected to 1024-2048 px at 8K and were silently faded out. Result: large patches of grass, sky, and foreground content present at 1080p went missing at 8K.

What changed

Key + value radix sort. Switched from packed-key u32 sort to ComputeRadixSort.sortIndirect's key+value mode (initialValues param). tileKeysBuffer holds tile indices, splatValuesBuffer holds splat indices; both get full 32-bit range. Deletes SPLAT_IDX_BITS/SPLAT_IDX_MASK from config and shrinks sort passes at 1080p from 8 to ~4 (numBits auto-derived from numTiles).
Sub-frame split with no truncation. Images larger than 120×68 tiles (= 1080p) are partitioned into a grid of ≤ 1080p sub-frames. Each sub-frame renders independently into a sub-frame-sized running state buffer, then is stitched into the final image. The global CPU depth sort is shared across sub-frames so there are no inter-sub-frame seams. maxCoveragePerSplat is set to the sub-frame's full tile area (rounded up to a power of two), which geometrically bounds every splat's bbox-in-sub-frame — so the project shader's coverage clamp never bites and emit-pairs writes every splat's full bbox. No hard tile edges anywhere.
Image-height-relative radius fade. Replaced RADIUS_FADE_START_PX / RADIUS_FADE_END_PX with RADIUS_FADE_START_FRAC / RADIUS_FADE_END_FRAC. The project shader computes the per-render threshold as frac × imageHeight, so the SAME world-space splats fade at every render resolution. After this fix the 8K-downsampled-to-1080p image matches 1080p direct at SSIM 0.93.
Right-sized running-state and output buffers. Buffers are sized on groupTilesX × groupTilesY × TILE_SIZE² instead of groupSizeTiles² (the bounding-square of the larger axis). Saves ~25 MB at 1080p, more at non-square aspect ratios and at higher resolutions.
Dropped multi-slot plumbing. numSlots / Slot[] / per-slot shader names removed; rasterizer holds a single PipelineBuffers. The "group" abstraction is retained — it's the hook the sub-frame split uses (beginGroup / dispatchChunk / finishGroup with a group origin and tile rectangle, the project shader's group-AABB cull, and group-pixel-origin uniforms).
Indirect-dispatch slot pool bumped to 256 K. PlayCanvas resets the device's indirect-dispatch slot counter at frame-end only; in an offline pipeline the counter monotonically grows across (chunk × sub-frame) iterations and at 8K can hit ~70 K slots. Each slot is 12 bytes so 256 K = 3 MB, cheap.
Centralised render tunables. AA_DILATION_COV, SIGMA_CUTOFF, DISCRIMINANT_FLOOR, OPACITY_CAP, MIN_ALPHA, MIN_TRANSMITTANCE, JACOBIAN_LIMIT_FACTOR, TILE_SIZE, the radius-fade fractions, the pair-buffer memory budget, and the MAX_COVERAGE_PER_SPLAT_AT_1080P baseline now all live in src/lib/render/config.ts with documentation. gaussian-aabb.ts reads SIGMA_CUTOFF from there too so the BVH extent and the projected splat radius stay in sync.
Golden-image regression tests. Added test/render-golden.test.mjs with tiny (320×240) and mid (640×360) fixtures shared between the runner and the regenerator via test/render-golden.cases.mjs. Byte-exact compares against checked-in webp goldens.

Verification

npm run build clean, no TypeScript warnings.
node --test test/render-golden.test.mjs — both fixtures pass byte-exact (tiny and mid).
npm run lint — 22 errors, all pre-existing on main; no new errors introduced.
Windmill scene (17.9 M splats) rendered at 1080p, 1440p, 4K, and 8K. All produce visually correct output with no tile artifacts, no seams, no missing content. The 8K-downsampled-to-1080p matches the 1080p direct render at SSIM 0.93 (the remaining ~7 % gap is residual SSAA sharpness from AA_DILATION_COV / DISCRIMINANT_FLOOR still being in absolute pixel units, which is reference-3DGS behaviour).

Performance and memory

1080p windmill: 1.08 s → 5.9 s render. Slower because chunkCap is forced down by maxStorageBufferBindingSize once maxCov covers the whole 1080p sub-frame area (16 384 tiles). Peak GPU memory unchanged at 2.20 GB.
8K windmill: 54 s render; previously this resolution silently produced corrupt or content-missing output. Peak GPU memory 3.31 GB (dominated by the image-wide 530 MB running state, 130 MB output, and per-chunk pair buffers).

Caveats / known follow-ups

AA_DILATION_COV and DISCRIMINANT_FLOOR are still absolute pixel units. Making them resolution-invariant would close the residual 7 % SSAA gap but is an "exceed reference 3DGS" decision and a separate change.
Project shader re-runs per sub-frame. Sharing projection across sub-frames within a chunk (project once, bin per sub-frame) would cut 8K render time by roughly 4–8× — a sizeable refactor, deferred.
Prefix-sum is single-workgroup. Per-thread element budget scales linearly with chunkCap; current scan is the placeholder from the rework plan's Open Question 1. Two-level Blelloch is the natural follow-up if chunkCap ever grows past ~1 M.
No seam-N regression fixture. The rework plan called for a synthetic scene engineered to span the original 128-px tile-group boundaries; not added. The current tiny + mid fixtures plus byte-exact passing under forced sub-frame split (verified locally with 7×5 split — still byte-exact) cover the boundary case adequately for now.
FAR_PLANE_NEAR_FACTOR is unused after the BVH removal; left in config for now.

Full reasoning trail and design alternatives considered are in the saved review at ~/.claude/plans/review-the-changes-committed-mutable-tiger.md.

Copilot

Pull request overview

This PR reworks the WebGPU splat renderer to produce correct, resolution-consistent output across large images (multi-resolution correctness), addressing prior key-overflow corruption, tile-edge truncation artifacts, and absolute-pixel fade thresholds.

Changes:

Replaces the old tile-stream/BVH-driven group pipeline with a new raster pass that globally depth-sorts once, then renders via sub-frame partitioning.
Updates the GPU rasterizer to a fully GPU-resident tile-binning pipeline (coverage → prefix-sum → emit-pairs → indirect radix sort (key+value) → boundary finding → binned raster).
Adds golden-image regression tests + a regenerator script, and adds a pretest hook to ensure dist/ exists for tests.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
test/render-golden.test.mjs	Adds byte-exact golden WebP renderer regression tests driven by shared cases.
test/render-golden.regenerate.mjs	Adds a script to regenerate checked-in golden WebP outputs from current CLI behavior.
test/render-golden.cases.mjs	Centralizes golden test case definitions shared by test + regenerator.
src/lib/render/tile-stream.ts	Removes the prior BVH + per-tile-group pipelined renderer orchestrator.
src/lib/render/raster-pass.ts	Adds the new multi-resolution raster orchestrator (global sort + sub-frame split + stitching).
src/lib/render/index.ts	Switches `renderSplats` export to the new raster pass implementation.
src/lib/render/config.ts	Centralizes render constants (sigma cutoff, fade thresholds, tile size, pair-buffer budget, etc.).
src/lib/render/bvh-query.ts	Removes BVH frustum-plane query helper (no longer used).
src/lib/gpu/index.ts	Stops exporting `TILE_SIZE` from GPU module (now sourced from render config).
src/lib/gpu/gpu-splat-rasterizer.ts	Major rewrite: adds GPU tile-binning pipeline, indirect dispatch handling, key+value radix sorting, and binned rasterization.
src/lib/data-table/gaussian-aabb.ts	Uses `SIGMA_CUTOFF` from centralized render config to keep extents consistent with raster radius.
package.json	Adds `pretest` to build before running tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/lib/gpu/gpu-splat-rasterizer.ts:8

BUFFERUSAGE_INDIRECT is imported but not used in this file. Please remove the unused import to keep the module clean and avoid introducing new lint noise.

import {
    BUFFERUSAGE_COPY_DST,
    BUFFERUSAGE_COPY_SRC,
    BUFFERUSAGE_INDIRECT,
    SHADERLANGUAGE_WGSL,
    SHADERSTAGE_COMPUTE,
    UNIFORMTYPE_FLOAT,
    UNIFORMTYPE_UINT,

Copilot

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

slimbuck added 15 commits May 20, 2026 14:35

phase0

c332307

latest

6643e5f

latest

e1f3bf4

latest

6afaf22

latest

1813274

latest

9d08d3b

latest

687e21b

latest

cd8b54a

latest

da8efb0

latest

a2c8f39

latest

ede179a

latest

c9d8cad

latest

7d75735

latest

8f84e3a

latest

fb429ae

slimbuck requested a review from Copilot May 21, 2026 13:21

slimbuck self-assigned this May 21, 2026

slimbuck added the bug Something isn't working label May 21, 2026

Copilot started reviewing on behalf of slimbuck May 21, 2026 13:22 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts

Comment thread src/lib/render/raster-pass.ts

latest

fd7ac8b

slimbuck requested a review from Copilot May 21, 2026 13:55

Copilot started reviewing on behalf of slimbuck May 21, 2026 13:56 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread src/lib/render/raster-pass.ts Outdated

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated

slimbuck added 2 commits May 21, 2026 15:08

latest

8e83a4a

latest

fd3eefb

slimbuck requested a review from Copilot May 21, 2026 14:11

Copilot started reviewing on behalf of slimbuck May 21, 2026 14:12 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread src/lib/gpu/gpu-splat-rasterizer.ts Outdated

Potential fix for pull request finding

d78cfd5

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

slimbuck marked this pull request as ready for review May 21, 2026 14:18

slimbuck requested a review from a team May 21, 2026 14:18

slimbuck merged commit cfee373 into playcanvas:main May 21, 2026
3 checks passed

slimbuck deleted the raster-rework branch May 21, 2026 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix multi-resolution rendering: key+value sort, sub-frame split, image-relative fade#243

Fix multi-resolution rendering: key+value sort, sub-frame split, image-relative fade#243
slimbuck merged 19 commits into
playcanvas:mainfrom
slimbuck:raster-rework

slimbuck commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

slimbuck commented May 21, 2026

Summary

What was broken

What changed

Verification

Performance and memory

Caveats / known follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants